• Open

    would disruptive denial of service AI be a viable option in internet malpractice since its progressed so much?
    title submitted by /u/Icy_Independence_125 [link] [comments]  ( 8 min )

  • Open

    Afro Pink
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    How could zero point energy help ai go to the next level ?
    I am curious about this mainly because many people are talking about uap ! The technology that powers these uap most likely are not oil or gas or batteries more likely zero point energy . My main question like i said how could it make ai even better to heights to where its never been before ! submitted by /u/jdgementdragonotk [link] [comments]  ( 8 min )
    What are some GitHub security best practices?
    It seems like about 90% of the stuff happening in AI is only accessible via GitHub. I'm probably just being overly cautious, but downloading something from such a public place is just not something I am currently comfortable with. What are your thought on this? Are there precautions you take that I should be aware of before venturing into this territory? Or is it just generally considered pretty safe, and nothing to worry about much? submitted by /u/gcubed [link] [comments]  ( 8 min )
    Offline music AI generator
    Hi all, i'm searching AI for music for my own game, so I want to find AI without copyright. But every model that i found were online with copyright rules. Is there any offline, better - opensource AI model? submitted by /u/krakotay1 [link] [comments]  ( 8 min )
    An open Source AI on a european server, readings PDFe (Texts), answering questions in different languages - pls help
    Hey All - i am looking for an open-source chatbot. He will get a bench of PDF documents (training-Data). this data should be searchable and the bot should be able to answer questions in different languages. so far, so ez: we´re in germany - which means we do have hard regulations for servers, which are NOT hosted in the EU (god i really hate to write that). so we need a model, which can be installed here, train it hre and finally it should his work here. If you have any advice - i would be really thankful! submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    How is it possible that there were no LLM AIs, then there was ChatGPT, now there are dozens of similar products?
    Like, didn’t ChatGPT need a whole company in stealth mode for years, with hundreds of millions of investment? How is it that they release their product and then overnight there are competitors – and not just from the massive tech companies? submitted by /u/Aquillyne [link] [comments]  ( 8 min )
    The Rockstars of the AI World
    AGI: Brainiac robots will be like the ultimate multitaskers on steroids. They will possess mind-blowing cognitive abilities, making them the geniuses who can solve complex math problems in their sleep and then casually drop fashion advice that would make Coco Chanel blush. You might catch them pondering the mysteries of the universe or composing symphonies that give Beethoven a run for his money. Are they secretly working as undercover geniuses on the weekends? All we know is that AGI will be among us to challenge our notion of human intelligence and make us wonder if we should start polishing our robot-dance moves to keep up and maintain them entertained. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Using google account data for AI personalization
    Brand new to AI and trying to understand possible functionalities, so bear with me if i sound uneducated. i know that you can download all of your google account data, (which i think is probably the largest amount of personalized information that most of us have easy access to. includes youtube history, google pay transactions, browsing history, tons of other info.) but my question is; after downloading this data, how could you implement this into some localized ai program to improve your life? it knows what restaurants you frequent, items you buy, travel patterns, vacations, email history, ect i would love to hear peoples ideas on this, thanks. submitted by /u/G-Boogie42 [link] [comments]  ( 8 min )
    Any methods for achieving close to elevenlabs quality and inference speed with a local model?
    My upcoming master thesis (on use of AI models for believable NPCs in video games) will involve the use of text to speech. Currently elevenlabs has been my go to for TTS, but the pricing model is quite inconvenient since its a monthly subscription instead of pay for use. I'm sure some of you here are knowledgeable with TTS, it would be great if anyone could point me in a good direction. I want use use TTS to generate natural human sounding conversational voices. Ideally I could finetune the model on different voice profiles to get a wide range of voices. I have only started to look I to options outside elevenlabs, but I figured I should ask here before I start diving deep, so I can avoid unnecessary waste of time! If this is not achievable locally, it would be great to know if there are any methods for hosting a model on a cloud compute platform, so I can at least pay for use, instead of deal with the monthly subscription that comes with eleven labs. Thank you! submitted by /u/timidavid350 [link] [comments]  ( 9 min )
    ai generated video compilations
    was wondering if there's an AI video tool where you put in keywords and it creates a compilation of gifs/clips based on your text. for example, if you typed in "dog", "car" and "clouds" it would create a video compiled of short clips of dogs, cars and clouds. sorry if this is a dumb question i was just curious if this exists haha thanks! submitted by /u/bonlom [link] [comments]  ( 8 min )
  • Open

    How would one normalize observations in off-policy online reinforcement learning?
    Can someone please help with this question . Please let me know if you'd like me to paste the whole question in the description here. Thank you! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Tuning Rewards/Adding More Rewards in RL is a headache
    I am working on RL and it is really exciting. But in a lot of cases I don't get the desired behaviours i want. So i either tune the weights of existing rewards or adding more reward terms trying to fix that, but later I realized that adding more rewards just make the problem more complex and I even get worse performance than before. I also tried hyperparameter tuning, but they are really slow and I don't see too much difference compared with my heuristic choice. I read some RL papers and I never figure out where these magic weights come from. Anyone can provide some advice will be very helpful :) submitted by /u/Alchemist1990 [link] [comments]  ( 8 min )
    Causes of RL agent reaching a more optimal policy but not continuing to improve
    Hey, I'm developing RL agent for optimal setpoint control but I'm experiencing some weird behaviour where the agent would reach a policy where it will perform better than all other previous policies, but doesn't continue improving and goes back to get stuck at a local minimum. I'm assuming this behaviour comes from wrong hyperparameter tuning, but I was wondering if there could be different sources for this behaviour. ​ Training (lower score is more optimal) This is the picture from training. I'm training with PPO and had non-zero value for entropy-coefficient. Edit: When I stop training at the dips, and test the agent, it will quite consistently score better. submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    "Solving math word problems with process- and outcome-based feedback", Uesato et al 2022 {DM}
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Extensions for SAC
    I am a starter in Reinforcement learning and stumbeled across SAC. While all other off-policy algorithm seem to have extensions (DQN,DDQN/DDPG,TD3) I am wondering what are extensions for SAC that are worth having a look at? I already found 2 papers (DR3 and TQC) but im not experienced enough to evaluate them. So i thought about building them and comparing them to others. Would be nice to hear someones opinion:) submitted by /u/MChiefMC [link] [comments]  ( 8 min )
    PPO agent for "2048": help requested
    Hi folks, I have started to dip my toes in reinforcement learning recently, reproducing some basic algorithms (DQN, REINFORCE, DDPG) with the easy gymnasium examples with success. I have now started with a new project with a custom environment: 2048. It seems to be an interesting target for a deep reinforcement learning agent: the action space is small (at most four directions) but with a large observation space to generalize over using (a) neural network(s). Here's where the problem starts: after implementing a custom environment that follows the typical gymnasium interface, and use a slightly adjusted PPO implementation from CleanRL, I cannot get the agent to learn anything at all, even though this specific implementation seems to work just fine on basic gymnasium examples. I am hoping…  ( 10 min )
    How to work with a large action space?
    I want to train a single agent for a game using reinforcement learning. I am totally new in this field. In the game, 4 characters can be moved around on a 10x10 board. After the character is moved to a new position, it can build something on one of the grid cells in a certain range from its new position. Each character has a different range of tiles, where they can move and where they can build a house. So, in general, the action space is 4x100x100 = 40000 including all the illegal moves, which is huge. Can you advise any way to shrink the action space? I am using Gymnasium to define the environment. Previously, I thought about using MultiDiscrete or Tuple spaces for action space, e.g. one dimension for characters, one dimension for 100 tiles to move, and one dimension for 100 tiles to build the house. However, how can I then mask the illegal moves? submitted by /u/naeson [link] [comments]  ( 9 min )
  • Open

    [D] NVIDIA Rapid Vs Pandas
    Hello ML Team, I am reaching out on behalf of Data Care LLC. We developed a large Kubernetes-based cluster with thousands of A4000 GPUs and are currently conducting a benchmarking experiment to compare NVIDIA RAPIDS against Pandas for various size workloads that needs a bit of scrubbing. Here are the details: Infra1: Pandas with 6 CPU cores, 16 GB RAM Infra2: A4000 with 16 GB VRAM, 16 GM RAM and 6 CPU Cores. Synthetic dataset of .8 GB, 1.6Gb, 3.2 GB, 6.4 GB sizes with errors with data that need to be scrubbed. You are welcome to define your own dataset. If any of you are interested in participating in this experiment, we would be happy to provide you with the necessary compute resources. To sign up for this experiment, please visit www.thedatacare.com/experiment-rapid We greatly appreciate your participation and encourage you to publish your results to contribute to the community. Thank you, Arun Data Care LLC submitted by /u/arunpalepu1981 [link] [comments]  ( 9 min )
    [DISCUSSION] How much can we trust OpenAI (and other large AI companies) will keep our fine tuned models and data sets private?
    tldr: Do you trust OpenAI or other large AI companies with your data? Do you reckon it's just a matter of time before they find all of the data, so might as well contribute to their research project and benefit from it while you can? Or do you prefer to go the open sourced route instead for this reason? Here's is my concern: Some of my team members are very high on Open AI's models, their ease of use, and how smart they are out the box. Now, Open AI (relatively recently) published a statement saying that they will not use your fine tuned models or data sets internally to improve their products, but given their history and the value of these fine tuned models and their corresponding datasets, I'm uncertain to what extent we are able to trust that they will keep our data private. I like t…  ( 9 min )
    Just was Offered a $130k Machine Learning Engineer Position - Is this Salary too Low? [D]
    I am about to graduate with my master's and I was offered a Machine Learning Engineer role with a salary of $130k. I have data science experience, but not much programming experience. I pushed back on the salary and said I was expecting a little more (I was paid more as a Data Scientist), and the recruiter said the team was taking a chance on me and I need to prove myself before they would raise my salary. I also want to mention I am switching into work that is more coding heavy than what I was doing as a Data Scientist. The job is exactly what I want to do though and the team was really great to talk to. The company has great reviews online and wonderful benefits. I am leaning towards just taking the job but this is my first job interview and I don't want to sell myself short. FYI I live in California, not the Bay Area. submitted by /u/datatastic08200 [link] [comments]  ( 9 min )
    [D] What have been your use cases for LLM autonomous agents?
    I've been using GPT for completions on a daily basis for a while now - code completion and search-like chatting, basically. I've recently been playing around with both ChatGPT plugins and LangChain for autonomous-agent-like behavior, and although the idea of the LLM interacting with the environment through API calls or code interpretation seems promising, in practice I haven't found such a useful and usable case for it like completions yet. LangChain's OpenAPI toolkit with its planner/controller agent duo seems to get lost 90% of the time, making it unusable. This happens even with an /api endpoint telling it exactly how to interact with the API and prompt templates suggesting that this endpoint be used to get the API specs. Maybe I'm just not getting it right... As for ChatGPT plugins, other than web search for more updated results I haven't really found a use case where I could not do the same thing with completions. Code Interpreter shaves off a few seconds vs completions and running whatever script it produces locally, but it's not very useful in face of compliance or privacy requirements of not uploading stuff into OpenAI. For example I wanted to speed up a work related video and add a separate audio track to it. I couldn't upload the video to OpenAI as it contained internal work stuff, so I just used completions for an ffmpeg script to do the job and ran it locally. Same thing with transforming or plotting CSV data - can't really update customer data to OpenAI, so just get the script and run it locally. Anyway, I can think of a lot of cool use cases for autonomous agents and the like, but I haven't been able to actually use it in my daily routine, unlike text completion. Have you been using autonomous agents successfully and regularly? submitted by /u/osantacruz [link] [comments]  ( 9 min )
    [D] When will the LLM winter come?
    Everyone is jumping on the bandwagon of LLMs, and it seems like we should follow suit. But why? LLMs are impressive; they can play chess, emulate a Linux terminal, perform mathematical calculations, convert code, and more. However, in most cases, they perform these tasks poorly, which means they cannot be used as is, at least not for now. While we come across numerous cool demos, the results are often cherry-picked and not applicable to real business use cases. I can only identify a few limited use cases where LLMs truly shine: QA, summarization, and acting as a co-pilot for tasks such as coding, writing, and education etc. In essence, they excel in seq2seq tasks. There are some notable drawbacks that should be acknowledged: Cost: The cost of using LLMs will never be cheaper than specialized solutions. Latency: Building real-time applications with LLMs is extremely challenging. Quality and Accuracy: For now, LLMs lack strong reasoning capabilities. Reliability: Hallucinations are still an issue. Occasionally, the model fails to follow instructions. Quotas: Currently, the quotas imposed on LLMs are too low for large-scale production applications. It is possible that most of these problems will be resolved in the future. However, the question remains: how long will it take? Will it be 1 year, 5 years, or even 10 years? Unfortunately, we need to see a ROI this year. Anyway, just want to remind you that there is no free lunch! submitted by /u/___mlm___ [link] [comments]  ( 9 min )
    [D] Regression Problem
    I’m trying to develop a ML model to be able to predict “time series” data. It’s not actually time series, it’s temperature series. But regardless, the order matters. I have 20 sets of tabular data in separate csv’s that each contain 3 columns. Column 1 is temperature from 800 to 1360, column 2 are values I’d like to be able to predict, and column 3 contains values that I’d like to use as input to predict column 2 values at each respective temperature. Because the temperature range is between 800 and 1360, each csv has 560 rows of data (1360 - 800 = 560). So in total, there’s 11,200 rows of data (20 sets * 560 = 11,200 total rows). The data in each column, excluding the temperature column, are second order polynomials. So I’m essentially trying to get from one polynomial (column 3) to another (column 2). What would be the best ML method to achieve this given my problem? I know there’s an underlying pattern but I need ML to help me determine it. Appreciate any help here, thanks!! submitted by /u/cgifted7 [link] [comments]  ( 9 min )
    Seeking Participants for AI-related Survey[R]
    I am currently working on my IB Extended Essay, and I would greatly appreciate your help in gathering valuable insights from individuals knowledgeable in the field of AI. The purpose of my survey is to understand the perspectives of AI enthusiasts and professionals like you. If you have a few minutes to spare, I kindly request you to participate in my survey. Your input will contribute significantly to my research and help me gain a deeper understanding of the topic. The survey covers various aspects of AI, and your expertise will be invaluable in shaping the results. Survey Link: https://forms.gle/PVGrRbPLTpZRbbpL9 Rest assured that all responses will be kept confidential and only used for academic purposes. Additionally, feel free to share this survey with others who might be interested or knowledgeable in the field. Thank you in advance for your time and contributions! Your participation will greatly aid in the successful completion of my IB Extended Essay. submitted by /u/KVNG_Winston [link] [comments]  ( 9 min )
    [R] All about evaluating Large language models
    I explored my curiosity on how to best evaluate LLMs and LLM application and consolidated my thoughts in this article https://explodinggradients.com/all-about-evaluating-large-language-models https://preview.redd.it/btfz3loxb6bb1.png?width=1920&format=png&auto=webp&s=94d1789c0e0f8c1dac50864996d908497a45a241 submitted by /u/iamikka [link] [comments]  ( 8 min )
    [D] Hacking LangChain for Fun and Profit
    https://blog.kevinhu.me/2023/07/10/hacking-langchain-for-fun-and-profit/ I'm starting a series of blogs to delve into LangChain. Hope this helps anyone who's interested in LLM and building with LangChain. submitted by /u/OLDGUN [link] [comments]  ( 8 min )
    [D] Forcing diversification in similarity search
    I’m using a vector database for storing image embeddings and using it for similarity search. If I pick top ten most similar vectors I can sometimes end up inside of an echo chamber with almost “duplicates” or too similar images. I would like to diversify the results so that all the results are close to the input vector but different between themselves. Are there common patterns/algorithms for this type of diversification? The idea that I have: I want to pick 10 images. I would query the database for a 100 of the most similar embeddings and use a clustering algorithm to cluster them into 10 clusters. Finally, I would pick one image from each cluster. submitted by /u/alkibijad [link] [comments]  ( 8 min )
    [D] Website to get historical price for agriculture commodities?
    Hey guys, i want to make price prediction for agriculture commodities such as grain, corn and coffee, I need Historical price data like open, low, high, close, and volume. You guys know which website that provides those data for free??? I tried https://markets.businessinsider.com/ but it was no help submitted by /u/Classic-Fee6889 [link] [comments]  ( 8 min )
    [P] We built a paper sharing platform. Looking for alpha testers!
    https://www.entepond.com We are a group of industry practitioners who got tired with sharing papers via social media platforms. They are not designed for that. It’s hard enough to keep track of the trending/latest papers nowadays. So, we built Pond, a paper sharing platform that makes sense. submitted by /u/dockerun [link] [comments]  ( 8 min )
    [D][R] Help with bachelor thesis
    Hi all, I am finishing my 3rd year UG AI program, and I’ve already been admitted to a MSc in AI. However, I need some ideas for my thesis which should be around 30 pages long. I’ve been doing some research on what I would like to study, but I still havent found anything I like. I am open to both an applied ML topic and a more theoretical one, so feel free to suggest something you think is appropriate 😄 submitted by /u/AskAmbitious5697 [link] [comments]  ( 8 min )
    [D] MPT-30b or Falcon 40b: which one is the better option
    In the open-source community, two models that have gained a lot of popularity is MPT and Falcon. However, I wonder which one is the best LLM. What are your considerations about both models? submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [Project] Brute force for popular models, Feature Reduction and Scaling Techniques for Classification [Python]
    This is my first article to try something new, it is brute forcing on classifications problems Does anybody have suggestions to improve the code, add more features, or suggest another idea https://baroodz.medium.com/brute-force-for-popular-models-feature-reduction-and-scaling-techniques-for-classification-69be9c0426b9 https://github.com/Barood-cmd/BruteforceML/ All opinions appreciated submitted by /u/Barood_D [link] [comments]  ( 8 min )
    [D] How does RTX 3060 12GB fare against the RTX 4060 Ti 16GB for
    Ok, now that the RTX 4060 Ti 16GB has been launched this question needs to be asked :) RTX 4060 Ti 16GB is faster has more/latest tensor cores but has less memory bandwidth 128bit 288gbps while the RTX 3060 12GB is slower (also cheaper) but has more memory bandwidth 192 bit 360gbps. less memory though. Does anyone has experience with the new card for machine learning purpose? ​ Specs: RTX 4060 Ti 16GB - https://www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti-16-gb.c4155 RTX 3060 12GB - https://www.techpowerup.com/gpu-specs/geforce-rtx-3060-12-gb-ga104.c3832 ​ submitted by /u/PhotonAttack [link] [comments]  ( 8 min )
    [D] - Issues with Model Collapse in VAE for Reconstruction Task
    I am currently experiencing a suspected model collapse in a Variational Autoencoder (VAE) model I am working with. Below are details on the project setup and the issue at hand: Project Goal: Exploring if a the same VAE architecture can independently handle two tasks - reconstruction and regression - using real-world production line data. Data: Dataset: Approximately 80,000 samples Features: Sensor readings such as flow, temperature, force, etc. (both binary and continuous data) Target for regression: The measurement of a product property at the end of the line (can be divided in approximately 50 classes) Preprocessing: Data normalized between 0 and 1 before training Model Architecture: Encoder: Input layer of 800 and a hidden layer of 600 Latent Space: Size of 400 Decoder: …  ( 10 min )
    [P] Image based financial statements ratio analysis like quick ratio, debt to equity ratio?
    I have image balance sheets and using aws ocr we extract text then format and segment them to find ratios like quick ratio, current ratio, debt to equity ratio for analysis using fuzzy string matching. However, sometimes the values for calculation of these ratio are present but with different keywords, values are there but the total is not present. Is there any literature or code base utilizing ML approach to do ratio analysis on financial statements submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    [D] Data Management for Machine Learning Projects
    Hey, r/MachineLearning, it’s Nir from DagsHub 🐶 Data management may seem straightforward, but beneath the surface, it poses complex hurdles that can impact project success. We've been thinking a lot about it lately, particularly within machine learning projects. So we decided to conduct research, talking with companies and individual practitioners about the challenges they face. We wanted to shed light on the often-overlooked challenges that come with this crucial aspect. If you're interested in understanding the intricacies of data management and how it influences ML projects, I invite you to check out our latest blog post where we share our initial results. It's an informative read that aims to foster knowledge-sharing within our community. This is one area we can all learn a lot from each other. Feel free to share your own experiences and insights related to data management challenges and how you overcome them. Here's the link to the blog post: https://dagshub.com/blog/challenges-in-data-management-for-machine-learning/ As always, looking forward to hearing your thoughts! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [D] Choosing the right embeddings model
    I'm trying to pick the right model to take short stories and pick similar short stories based on the story. I was looking through https://huggingface.co/spaces/mteb/leaderboard and trying to figure out what model I should use. Should I be looking at the top clustering models? I was going to use chatgpt ada002 but would rather use a better model if possible submitted by /u/juniorskimbrough [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    For deep learning practitioners in industry, is the workflow always this annoying? [D]
    Right now I’m working on a multivariate time series classification problem for a lab I’m working at. The training set is a multivariate time series with data collected over minutely level. Each time series is a type of positional tracking measurement. The goal is to classify each of the time series and see if they can discriminate between 3 different types of outcomes of interest. The labels are 0,1,2. I’m doing this in PyTorch, and I read roughly 8-9 papers. They all seem to point to convolutional neural networks as a starting point. After building the model and setting up the training loops, this whole process feels like maybe 1% model building and 99% debugging, tuning, architecture changing, and tensor shape checking. My day to day has just been stepping through my architecture with …  ( 9 min )
    [D] LLM Multi-Rounds Fine-Tuning
    I have 2 datasets that I want to use for fine-tuning multiple LLMs. Dataset #1 is large and has "raw" text data (many scraped webpages revolving around different topics that an LLM may not have been pre-trained on). Dataset #2 is very small and has selected examples depicting how a topic from Dataset #1 should be presented to the user during chat interactions. I am trying to determine a good strategy for fine-tuning. I could not find any examples of "two-round" fine tuning in a scenario like this. I am thinking about using quantization and training just the last few layers of selected LLM on Dataset #1, and using the LoRA method for Dataset #2. However, I do not have much compute resources. Any tips/suggestions/advice would be greatly appreciated! submitted by /u/ilovejoi36912 [link] [comments]  ( 8 min )
  • Open

    Visualizing a determinant identity
    The previous post discussed an algorithm developed by Charles Dodgson (better known as Lewis Carroll) for computing determinants. The key identity for proving that Dodgson’s algorithm is correct involves the Desnanot-Jacobi identity from 1841. The identity is intimidating in its symbolic form and yet easy to visualize. In algebraic form the identity says Here a […] Visualizing a determinant identity first appeared on John D. Cook.  ( 5 min )
    How Lewis Carroll computed determinants
    Charles Dodgson, better known by his pen name Lewis Carroll, discovered a method of calculating determinants now known variously as the method of contractants, Dodgson condensation, or simply condensation. The method was devised for ease of computation by hand, but it has features that make it a practical method for computation by machine. Overview The […] How Lewis Carroll computed determinants first appeared on John D. Cook.  ( 6 min )
    Gram matrix
    An elegant algebraic identity says If x is the vector [a b] and y is the vector [c d] then this identity can be written where the dot indicates the usual dot product. I posted this on Twitter the other day. Gram matrix Now suppose that x and y are vectors of any length n. […] Gram matrix first appeared on John D. Cook.  ( 5 min )
  • Open

    3 Questions: Honing robot perception and mapping
    Luca Carlone and Jonathan How of MIT LIDS discuss how future robots might perceive and interact with their environment.  ( 8 min )
  • Open

    Design Speed Takes the Lead: Trek Bicycle Competes in Tour de France With Bikes Developed Using NVIDIA GPUs
    NVIDIA RTX is spinning new cycles for designs. Trek Bicycle is using GPUs to bring design concepts to life. The Wisconsin-based company, one of the largest bicycle manufacturers in the world, aims to create bikes with the highest-quality craftsmanship. With its new partner Lidl, an international retailer chain, Trek Bicycle also owns a cycling team, Read article >  ( 7 min )
  • Open

    Renovating computer systems securely and progressively with APRON
    This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research. Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage […] The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.  ( 10 min )
  • Open

    AIOps above the radar – Using AI to monitor your AI infrastructure
    When an enterprise project is low-profile (“below the radar”), then it is not likely to be the target of bad actors. Similarly, if some part of that project’s infrastructure fails or falters, then the consequences of the problem and/or the urgency of providing a solution are usually manageable. But when a high-profile (“above the radar”)… Read More »AIOps above the radar – Using AI to monitor your AI infrastructure The post AIOps above the radar – Using AI to monitor your AI infrastructure appeared first on Data Science Central.  ( 22 min )
    Security data lakes and the future of organizational security
    Evolving technological advancements have created a far more data-centric world. This has dramatically changed the enterprise landscape, while also creating more data silos. The explosion of cybersecurity tools and mounds of data in modern enterprises have made it difficult to combine data to create a unified view. This has resulted in siloed data that’s also… Read More »Security data lakes and the future of organizational security The post Security data lakes and the future of organizational security appeared first on Data Science Central.  ( 21 min )
    Sentience: AI has demystified human consciousness, intelligence
    There is a recent article, Unraveling the Mystery of Human Consciousness, where it was stated that, “Consciousness makes us capable of experiencing the scent of a rose, the touch of a breeze, the taste of food, the sound of music, and the sight of a sunrise. We also have a unique ability to be aware… Read More »Sentience: AI has demystified human consciousness, intelligence The post Sentience: AI has demystified human consciousness, intelligence appeared first on Data Science Central.  ( 19 min )
    Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers
    Amazon Kendra and LLamaIndex can help with knowledge integration but fall short in connecting diverse knowledge sources, to enable efficient intelligent search. In this article, we compare the existing solutions and explain how to overcome their limitations using a Google Drive crawler. Companies often face difficulties in consolidating their knowledge base when their data is… Read More »Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers The post Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers appeared first on Data Science Central.  ( 27 min )
    What’s missing from ChatGPT and other LLMs?
    Recent developments in artificial intelligence remind me of the automotive industry in the late 19th and early 20th century. In that case, it took the industry several decades to commit to internal combustion engines. And while that picture was still unclear, there were over 250 different car manufacturers, some of whom were producing steam-powered cars.… Read More »What’s missing from ChatGPT and other LLMs? The post What’s missing from ChatGPT and other LLMs? appeared first on Data Science Central.  ( 21 min )
  • Open

    Google at ACL 2023
    Posted by Malaya Jules, Program Manager, Google This week, the 61st annual meeting of the Association for Computational Linguistics (ACL), a premier conference covering a broad spectrum of research areas that are concerned with computational approaches to natural language, is taking place online. As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2023, Google will showcase the latest research in the field with over 50 publications, and active involvement in a variety of workshops and tutorials. bold). Board and Organizing Committee Dan Garrette Workshop chairs include: Annie Louis Publication chairs include: Lei Shu Program Committee includes: Vinodkumar Prabhakaran, Najoung Kim, Markus Freitag Spotlight papers NusaCrowd: O…  ( 93 min )
  • Open

    A lighting-fast and developer-friendly Federated Learning SDK
    Hello Reddit 👋 I am contacting you because I am one of the community relations coordinators of MetisFL, a federated learning framework. https://github.com/NevronAI/metisfl#readme I would really love and appreciate it if you checked out and star our repo and considered using the framework or contributing to it 🤗 The code is the result of several years of research on federated learning and was just released a few weeks ago. So you would be one of the first outside users/contributors! We are currently preparing for the first stable release of our pip package and any community input would be much appreciated! Please do not hesitate to reach out to us anytime, we are always available on our Slack channel to answer questions about MetisFL, solve technical issues or chat about the future of tech and AI! Slack is pretty new as well, you'd be one of the first outside members! Hope to see you in our online community! submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    Transformers
    Is anyone know where can I get more detailed Explanation for better Understanding of RNN,LSTM, Transformers,BERT and Practical Implementation of it too submitted by /u/Potential_Plant_160 [link] [comments]  ( 8 min )
    Fast SAM (Segment Anything Model) review
    ​ https://preview.redd.it/gvfxnbpfx2bb1.jpg?width=871&format=pjpg&auto=webp&s=63778cd329dd8ba56dc77e9b67947b334f4eeb81 You can find it interesting. A new post from the OpenCV.ai team about Fast SAM (Segment Anything Model) Short description: Fast SAM, an innovative approach to the Segment Anything Task, drastically increases SAM model speed by 50 times. It introduces prompts to facilitate a broad range of problem-solving, breaking barriers of traditional supervised learning. This blog post will briefly explain the segment anything task and the Fast SAM approach. More details are here submitted by /u/No-Independence5880 [link] [comments]  ( 8 min )
  • Open

    On the Stepwise Nature of Self-Supervised Learning
    Figure 1: stepwise behavior in self-supervised learning. When training common SSL algorithms, we find that the loss descends in a stepwise fashion (top left) and the learned embeddings iteratively increase in dimensionality (bottom left). Direct visualization of embeddings (right; top three PCA directions shown) confirms that embeddings are initially collapsed to a point, which then expands to a 1D manifold, a 2D manifold, and beyond concurrently with steps in the loss. It is widely believed that deep learning’s stunning success is due in part to its ability to discover and extract useful representations of complex data. Self-supervised learning (SSL) has emerged as a leading framework for learning these representations for images directly from unlabeled data, similar to how LLMs learn re…  ( 5 min )
  • Open

    GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting. (arXiv:2307.03595v1 [cs.LG])
    Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting, especially in real-world applications. However, to forecast accurately, these sophisticated models typically rely on a large number of time series examples with substantial history. A rapidly growing topic of interest is forecasting time series which lack sufficient historical data -- often referred to as the ``cold start'' problem. In this paper, we introduce a novel yet simple method to address this problem by leveraging graph neural networks (GNNs) as a data augmentation for enhancing the encoder used by such forecasters. These GNN-based features can capture complex inter-series relationships, and their generation process can be optimized end-to-end with the forecasting task. We show that our architecture can use either data-driven or domain knowledge-defined graphs, scaling to incorporate information from multiple very large graphs with millions of nodes. In our target application of demand forecasting for a large e-commerce retailer, we demonstrate on both a small dataset of 100K products and a large dataset with over 2 million products that our method improves overall performance over competitive baseline models. More importantly, we show that it brings substantially more gains to ``cold start'' products such as those newly launched or recently out-of-stock.  ( 2 min )
    ContextLabeler Dataset: physical and virtual sensors data collected from smartphone usage in-the-wild. (arXiv:2307.03586v1 [cs.HC])
    This paper describes a data collection campaign and the resulting dataset derived from smartphone sensors characterizing the daily life activities of 3 volunteers in a period of two weeks. The dataset is released as a collection of CSV files containing more than 45K data samples, where each sample is composed by 1332 features related to a heterogeneous set of physical and virtual sensors, including motion sensors, running applications, devices in proximity, and weather conditions. Moreover, each data sample is associated with a ground truth label that describes the user activity and the situation in which she was involved during the sensing experiment (e.g., working, at restaurant, and doing sport activity). To avoid introducing any bias during the data collection, we performed the sensing experiment in-the-wild, that is, by using the volunteers' devices, and without defining any constraint related to the user's behavior. For this reason, the collected dataset represents a useful source of real data to both define and evaluate a broad set of novel context-aware solutions (both algorithms and protocols) that aim to adapt their behavior according to the changes in the user's situation in a mobile environment.  ( 2 min )
    Equivariant Single View Pose Prediction Via Induced and Restricted Representations. (arXiv:2307.03704v1 [cs.CV])
    Learning about the three-dimensional world from two-dimensional images is a fundamental problem in computer vision. An ideal neural network architecture for such tasks would leverage the fact that objects can be rotated and translated in three dimensions to make predictions about novel images. However, imposing SO(3)-equivariance on two-dimensional inputs is difficult because the group of three-dimensional rotations does not have a natural action on the two-dimensional plane. Specifically, it is possible that an element of SO(3) will rotate an image out of plane. We show that an algorithm that learns a three-dimensional representation of the world from two dimensional images must satisfy certain geometric consistency properties which we formulate as SO(2)-equivariance constraints. We use the induced and restricted representations of SO(2) on SO(3) to construct and classify architectures which satisfy these geometric consistency constraints. We prove that any architecture which respects said consistency constraints can be realized as an instance of our construction. We show that three previously proposed neural architectures for 3D pose prediction are special cases of our construction. We propose a new algorithm that is a learnable generalization of previously considered methods. We test our architecture on three pose predictions task and achieve SOTA results on both the PASCAL3D+ and SYMSOL pose estimation tasks.
    Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data. (arXiv:2307.03347v1 [cs.LG])
    For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource-limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called Universal and joint knowledge distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher's and student's representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks.  ( 2 min )
    Auxiliary Functions as Koopman Observables: Data-Driven Polynomial Optimization for Dynamical Systems. (arXiv:2303.01483v2 [math.DS] UPDATED)
    We present a flexible data-driven method for dynamical system analysis that does not require explicit model discovery. The method is rooted in well-established techniques for approximating the Koopman operator from data and is implemented as a semidefinite program that can be solved numerically. Furthermore, the method is agnostic of whether data is generated through a deterministic or stochastic process, so its implementation requires no prior adjustments by the user to accommodate these different scenarios. Rigorous convergence results justify the applicability of the method, while also extending and uniting similar results from across the literature. Examples on discovering Lyapunov functions, performing ergodic optimization, and bounding extrema over attractors for both deterministic and stochastic dynamics exemplify these convergence results and demonstrate the performance of the method.
    One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention. (arXiv:2307.03576v1 [cs.LG])
    Recent works have empirically analyzed in-context learning and shown that transformers trained on synthetic linear regression tasks can learn to implement ridge regression, which is the Bayes-optimal predictor, given sufficient capacity [Aky\"urek et al., 2023], while one-layer transformers with linear self-attention and no MLP layer will learn to implement one step of gradient descent (GD) on a least-squares linear regression objective [von Oswald et al., 2022]. However, the theory behind these observations remains poorly understood. We theoretically study transformers with a single layer of linear self-attention, trained on synthetic noisy linear regression data. First, we mathematically show that when the covariates are drawn from a standard Gaussian distribution, the one-layer transformer which minimizes the pre-training loss will implement a single step of GD on the least-squares linear regression objective. Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD. However, if only the distribution of the responses is changed, then this does not have a large effect on the learned algorithm: even when the response comes from a more general family of $\textit{nonlinear}$ functions, the global minimizer of the pre-training loss still implements a single step of GD on a least-squares linear regression objective.
    When does the ID algorithm fail?. (arXiv:2307.03750v1 [stat.ME])
    The ID algorithm solves the problem of identification of interventional distributions of the form p(Y | do(a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p(Y | do(a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p(Y | do(a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
    BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits. (arXiv:2307.03587v1 [cs.LG])
    We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.
    When Fair Classification Meets Noisy Protected Attributes. (arXiv:2307.03306v1 [cs.LG])
    The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.
    Federated Unlearning via Active Forgetting. (arXiv:2307.03363v1 [cs.LG])
    The increasing concerns regarding the privacy of machine learning models have catalyzed the exploration of machine unlearning, i.e., a process that removes the influence of training data on machine learning models. This concern also arises in the realm of federated learning, prompting researchers to address the federated unlearning problem. However, federated unlearning remains challenging. Existing unlearning methods can be broadly categorized into two approaches, i.e., exact unlearning and approximate unlearning. Firstly, implementing exact unlearning, which typically relies on the partition-aggregation framework, in a distributed manner does not improve time efficiency theoretically. Secondly, existing federated (approximate) unlearning methods suffer from imprecise data influence estimation, significant computational burden, or both. To this end, we propose a novel federated unlearning framework based on incremental learning, which is independent of specific models and federated settings. Our framework differs from existing federated unlearning methods that rely on approximate retraining or data influence estimation. Instead, we leverage new memories to overwrite old ones, imitating the process of \textit{active forgetting} in neurology. Specifically, the model, intended to unlearn, serves as a student model that continuously learns from randomly initiated teacher models. To preserve catastrophic forgetting of non-target data, we utilize elastic weight consolidation to elastically constrain weight change. Extensive experiments on three benchmark datasets demonstrate the efficiency and effectiveness of our proposed method. The result of backdoor attacks demonstrates that our proposed method achieves satisfying completeness.
    STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map. (arXiv:2307.03374v1 [cs.LG])
    Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one, and some groupings might produce performance degradation due to negative interference between tasks. Furthermore, existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on hand-crafted features, specifically Data Maps, which capture the training behavior for each classification task during the MTL training. We experiment with the method demonstrating its effectiveness, even on an unprecedented number of tasks (up to 100).
    Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages. (arXiv:2307.03679v1 [cs.CL])
    By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns
    Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations. (arXiv:2307.03678v1 [cs.CL])
    This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.
    SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation. (arXiv:2307.03716v1 [cs.RO])
    Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.
    Scalable Membership Inference Attacks via Quantile Regression. (arXiv:2307.03694v1 [cs.LG])
    Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.
    INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers. (arXiv:2307.03712v1 [cs.LG])
    The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
    Learning Theory of Distribution Regression with Neural Networks. (arXiv:2307.03487v1 [stat.ML])
    In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.  ( 2 min )
    Accelerated Optimization Landscape of Linear-Quadratic Regulator. (arXiv:2307.03590v1 [math.OC])
    Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both the SLQR and the OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.
    PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs. (arXiv:2307.03630v1 [cs.LG])
    We consider the problem of learning Neural Ordinary Differential Equations (neural ODEs) within the context of Linear Parameter-Varying (LPV) systems in continuous-time. LPV systems contain bilinear systems which are known to be universal approximators for non-linear systems. Moreover, a large class of neural ODEs can be embedded into LPV systems. As our main contribution we provide Probably Approximately Correct (PAC) bounds under stability for LPV systems related to neural ODEs. The resulting bounds have the advantage that they do not depend on the integration interval.
    Online Network Source Optimization with Graph-Kernel MAB. (arXiv:2307.03641v1 [cs.LG])
    We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
    Differentiable Turbulence. (arXiv:2307.03683v1 [physics.flu-dyn])
    Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS models for two-dimensional turbulent flow. We perform an in-depth analysis of the inductive biases in the chosen architectures, finding that the inclusion of small-scale non-local features is most critical to effective SGS modeling, while large-scale features can improve pointwise accuracy of the a-posteriori solution field. The filtered velocity gradient tensor can be mapped directly to the SGS stress via decomposition of the inputs and outputs into isotropic, deviatoric, and anti-symmetric components. We see that the model can generalize to a variety of flow configurations, including higher and lower Reynolds numbers and different forcing conditions. We show that the differentiable physics paradigm is more successful than offline, a-priori learning, and that hybrid solver-in-the-loop approaches to deep learning offer an ideal balance between computational efficiency, accuracy, and generalization. Our experiments provide physics-based recommendations for deep-learning based SGS modeling for generalizable closure modeling of turbulence.
    Incentive-Theoretic Bayesian Inference for Collaborative Science. (arXiv:2307.03748v1 [stat.ME])
    Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.
    PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection. (arXiv:2307.03211v1 [q-bio.QM])
    Patch classification models based on deep learning have been utilized in whole-slide images (WSI) of H&E-stained tissue samples to assist pathologists in grading follicular lymphoma patients. However, these approaches still require pathologists to manually identify centroblast cells and provide refined labels for optimal performance. To address this, we propose PseudoCell, an object detection framework to automate centroblast detection in WSI (source code is available at https://github.com/IoBT-VISTEC/PseudoCell.git). This framework incorporates centroblast labels from pathologists and combines them with pseudo-negative labels obtained from undersampled false-positive predictions using the cell's morphological features. By employing PseudoCell, pathologists' workload can be reduced as it accurately narrows down the areas requiring their attention during examining tissue. Depending on the confidence threshold, PseudoCell can eliminate 58.18-99.35% of non-centroblasts tissue areas on WSI. This study presents a practical centroblast prescreening method that does not require pathologists' refined labels for improvement. Detailed guidance on the practical implementation of PseudoCell is provided in the discussion section.  ( 2 min )
    Unpaired Multi-View Graph Clustering with Cross-View Structure Matching. (arXiv:2307.03476v1 [cs.LG])
    Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are pre-defined or given in advance. However, the data correspondence is often incomplete in real-world applications due to data corruption or sensor differences, referred as the data-unpaired problem (DUP) in multi-view literature. Although several attempts have been made to address the DUP issue, they suffer from the following drawbacks: 1) Most methods focus on the feature representation while ignoring the structural information of multi-view data, which is essential for clustering tasks; 2) Existing methods for partially unpaired problems rely on pre-given cross-view alignment information, resulting in their inability to handle fully unpaired problems; 3) Their inevitable parameters degrade the efficiency and applicability of the models. To tackle these issues, we propose a novel parameter-free graph clustering framework termed Unpaired Multi-view Graph Clustering framework with Cross-View Structure Matching (UPMGC-SM). Specifically, unlike the existing methods, UPMGC-SM effectively utilizes the structural information from each view to refine cross-view correspondences. Besides, our UPMGC-SM is a unified framework for both the fully and partially unpaired multi-view graph clustering. Moreover, existing graph clustering methods can adopt our UPMGC-SM to enhance their ability for unpaired scenarios. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both paired and unpaired datasets.
    Programmable Synthetic Tabular Data Generation. (arXiv:2307.03577v1 [cs.LG])
    Large amounts of tabular data remain underutilized due to privacy, data quality, and data sharing limitations. While training a generative model producing synthetic data resembling the original distribution addresses some of these issues, most applications require additional constraints from the generated data. Existing synthetic data approaches are limited as they typically only handle specific constraints, e.g., differential privacy (DP) or increased fairness, and lack an accessible interface for declaring general specifications. In this work, we introduce ProgSyn, the first programmable synthetic tabular data generation algorithm that allows for comprehensive customization over the generated data. To ensure high data quality while adhering to custom specifications, ProgSyn pre-trains a generative model on the original dataset and fine-tunes it on a differentiable loss automatically derived from the provided specifications. These can be programmatically declared using statistical and logical expressions, supporting a wide range of requirements (e.g., DP or fairness, among others). We conduct an extensive experimental evaluation of ProgSyn on a number of constraints, achieving a new state-of-the-art on some, while remaining general. For instance, at the same fairness level we achieve 2.3% higher downstream accuracy than the state-of-the-art in fair synthetic data generation on the Adult dataset. Overall, ProgSyn provides a versatile and accessible framework for generating constrained synthetic tabular data, allowing for specifications that generalize beyond the capabilities of prior work.  ( 2 min )
    Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays. (arXiv:2307.03327v1 [cs.LG])
    This work presents the first applications of self-supervised learning applied to data from digital antenna arrays. Encoder-decoder networks are pretrained on digital array data to perform a self-supervised noisy-reconstruction task called channel in-painting, in which the network infers the contents of array data that has been masked with zeros. The self-supervised step requires no human-labeled data. The encoder architecture and weights from pretraining are then transferred to a new network with a task-specific decoder, and the new network is trained on a small volume of labeled data. We show that pretraining on the unlabeled data allows the new network to perform the task of bandwidth regression on the digital array data better than an equivalent network that is trained on the same labeled data from random initialization.
    GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies. (arXiv:2307.03675v1 [cs.LG])
    Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
    Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning. (arXiv:2001.04413v6 [cs.LG] UPDATED)
    Deep learning is also known as hierarchical learning, where the learner _learns_ to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. On the conceptual side, we present a theoretical characterizations of how certain types of deep (i.e. super-constant layer) neural networks can still be sample and time efficiently trained on some hierarchical tasks, when no existing algorithm (including layerwise training, kernel method, etc) is known to be efficient. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers. We believe this is a key behind how deep learning is performing deep (hierarchical) learning, as opposed to layerwise learning or simulating some non-hierarchical method. On the technical side, we show for every input dimension $d > 0$, there is a concept class of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions using "backward feature correction." In contrast, we do not know any other simpler algorithm (including layerwise training, applying kernel method sequentially, training a two-layer network, etc) that can learn this concept class in $\mathsf{poly}(d)$ time even to any $d^{-0.01}$ error. As a side result, we prove $d^{\omega(1)}$ lower bounds for several non-hierarchical learners, including any kernel methods.
    Equivariant Spherical CNN for Data Efficient and High-Performance Medical Image Processing. (arXiv:2307.03298v1 [eess.IV])
    This work highlights the significance of equivariant networks as efficient and high-performance approaches for tomography applications. Our study builds upon the limitations of Convolutional Neural Networks (CNNs), which have shown promise in post-processing various medical imaging systems. However, the efficiency of conventional CNNs heavily relies on an undiminished and proper training set. To tackle this issue, in this study, we introduce an equivariant network, aiming to reduce CNN's dependency on specific training sets. We evaluate the efficacy of equivariant CNNs on spherical signals for tomographic medical imaging problems. Our results demonstrate superior quality and computational efficiency of spherical CNNs (SCNNs) in denoising and reconstructing benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observe a significant decrease in computational costs while maintaining the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of this network for broader tomography applications, particularly those requiring omnidirectional representation.  ( 2 min )
    Variational quantum regression algorithm with encoded data structure. (arXiv:2307.03334v1 [quant-ph])
    Variational quantum algorithms (VQAs) prevail to solve practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. For variational quantum machine learning, a variational algorithm with model interpretability built into the algorithm is yet to be exploited. In this paper, we construct a quantum regression algorithm and identify the direct relation of variational parameters to learned regression coefficients, while employing a circuit that directly encodes the data in quantum amplitudes reflecting the structure of the classical data table. The algorithm is particularly suitable for well-connected qubits. With compressed encoding and digital-analog gate operation, the run time complexity is logarithmically more advantageous than that for digital 2-local gate native hardware with the number of data entries encoded, a decent improvement in noisy intermediate-scale quantum computers and a minor improvement for large-scale quantum computing Our suggested method of compressed binary encoding offers a remarkable reduction in the number of physical qubits needed when compared to the traditional one-hot-encoding technique with the same input data. The algorithm inherently performs linear regression but can also be used easily for nonlinear regression by building nonlinear features into the training data. In terms of measured cost function which distinguishes a good model from a poor one for model training, it will be effective only when the number of features is much less than the number of records for the encoded data structure to be observable. To echo this finding and mitigate hardware noise in practice, the ensemble model training from the quantum regression model learning with important feature selection from regularization is incorporated and illustrated numerically.
    Simulation-free Schr\"odinger bridges via score and flow matching. (arXiv:2307.03672v1 [cs.LG])
    We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
    F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning. (arXiv:2004.11145v2 [cs.LG] UPDATED)
    Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.
    QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models. (arXiv:2307.03738v1 [cs.LG])
    We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution. A preliminary implementation is available at https://github.com/IST-DASLab/QIGen.
    Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model. (arXiv:2307.03423v1 [eess.IV])
    Hyperspectral images (HSI) have a large amount of spectral information reflecting the characteristics of matter, while their spatial resolution is low due to the limitations of imaging technology. Complementary to this are multispectral images (MSI), e.g., RGB images, with high spatial resolution but insufficient spectral bands. Hyperspectral and multispectral image fusion is a technique for acquiring ideal images that have both high spatial and high spectral resolution cost-effectively. Many existing HSI and MSI fusion algorithms rely on known imaging degradation models, which are often not available in practice. In this paper, we propose a deep fusion method based on the conditional denoising diffusion probabilistic model, called DDPM-Fus. Specifically, the DDPM-Fus contains the forward diffusion process which gradually adds Gaussian noise to the high spatial resolution HSI (HrHSI) and another reverse denoising process which learns to predict the desired HrHSI from its noisy version conditioning on the corresponding high spatial resolution MSI (HrMSI) and low spatial resolution HSI (LrHSI). Once the training is completes, the proposed DDPM-Fus implements the reverse process on the test HrMSI and LrHSI to generate the fused HrHSI. Experiments conducted on one indoor and two remote sensing datasets show the superiority of the proposed model when compared with other advanced deep learningbased fusion methods. The codes of this work will be opensourced at this address: https://github.com/shuaikaishi/DDPMFus for reproducibility.  ( 3 min )
    Polybot: Training One Policy Across Robots While Embracing Variability. (arXiv:2307.03719v1 [cs.RO])
    Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets. However, robotic platforms possess varying control schemes, camera viewpoints, kinematic configurations, and end-effector morphologies, posing significant challenges when transferring manipulation skills from one platform to another. To tackle this problem, we propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras and a unified, but modular codebase. To bridge the remaining domain shift, we align our policy's internal representations across embodiments through contrastive learning. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes: the WidowX 250S, the Franka Emika Panda, and the Sawyer. Our results demonstrate significant improvements in success rate and sample efficiency for our policy when using new task data collected on a different robot, validating our proposed design decisions. More details and videos can be found on our anonymized project website: https://sites.google.com/view/polybot-multirobot
    MALIBO: Meta-learning for Likelihood-free Bayesian Optimization. (arXiv:2307.03565v1 [cs.LG])
    Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
    Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization. (arXiv:2307.03571v1 [cs.LG])
    This paper introduces a smooth method for (structured) sparsity in $\ell_q$ and $\ell_{p,q}$ regularized optimization problems. Optimization of these non-smooth and possibly non-convex problems typically relies on specialized procedures. In contrast, our general framework is compatible with prevalent first-order optimization methods like Stochastic Gradient Descent and accelerated variants without any required modifications. This is accomplished through a smooth optimization transfer, comprising an overparametrization of selected model parameters using Hadamard products and a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization of the surrogate parameters induces non-smooth and non-convex $\ell_q$ or $\ell_{p,q}$ regularization in the original parametrization. We show that our approach yields not only matching global minima but also equivalent local minima. This is particularly useful in non-convex sparse regularization, where finding global minima is NP-hard and local minima are known to generalize well. We provide a comprehensive overview consolidating various literature strands on sparsity-inducing parametrizations and propose meaningful extensions to existing approaches. The feasibility of our approach is evaluated through numerical experiments, which demonstrate that its performance is on par with or surpasses commonly used implementations of convex and non-convex regularization methods.
    Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog. (arXiv:2307.03681v1 [cs.CY])
    Artificial Intelligence (AI) has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears the risk of unfair treatment of individuals when processing personal data e.g., to support credit lending or staff recruitment decisions. The emergence of these new risks is closely linked to the fact that the behavior of AI applications, particularly those based on Machine Learning (ML), is essentially learned from large volumes of data and is not predetermined by fixed programmed rules. Thus, the issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications by stakeholders in politics, business and society. In addition, there is mutual agreement that the requirements for trustworthy AI, which are often described in an abstract way, must now be made clear and tangible. One challenge to overcome here relates to the fact that the specific quality criteria for an AI application depend heavily on the application context and possible measures to fulfill them in turn depend heavily on the AI technology used. Lastly, practical assessment procedures are needed to evaluate whether specific AI applications have been developed according to adequate quality standards. This AI assessment catalog addresses exactly this point and is intended for two target groups: Firstly, it provides developers with a guideline for systematically making their AI applications trustworthy. Secondly, it guides assessors and auditors on how to examine AI applications for trustworthiness in a structured way.  ( 3 min )
    Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms. (arXiv:2307.03357v1 [cs.LG])
    Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.  ( 2 min )
    Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection. (arXiv:2307.03377v1 [cs.CL])
    This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks.  ( 2 min )
    Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs. (arXiv:2307.03393v1 [cs.LG])
    Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs.  ( 2 min )
    On Formal Feature Attribution and Its Approximation. (arXiv:2307.03380v1 [cs.AI])
    Recent years have witnessed the widespread use of artificial intelligence (AI) algorithms and machine learning (ML) models. Despite their tremendous success, a number of vital problems like ML model brittleness, their fairness, and the lack of interpretability warrant the need for the active developments in explainable artificial intelligence (XAI) and formal ML model verification. The two major lines of work in XAI include feature selection methods, e.g. Anchors, and feature attribution techniques, e.g. LIME and SHAP. Despite their promise, most of the existing feature selection and attribution approaches are susceptible to a range of critical issues, including explanation unsoundness and out-of-distribution sampling. A recent formal approach to XAI (FXAI) although serving as an alternative to the above and free of these issues suffers from a few other limitations. For instance and besides the scalability limitation, the formal approach is unable to tackle the feature attribution problem. Additionally, a formal explanation despite being formally sound is typically quite large, which hampers its applicability in practical settings. Motivated by the above, this paper proposes a way to apply the apparatus of formal XAI to the case of feature attribution based on formal explanation enumeration. Formal feature attribution (FFA) is argued to be advantageous over the existing methods, both formal and non-formal. Given the practical complexity of the problem, the paper then proposes an efficient technique for approximating exact FFA. Finally, it offers experimental evidence of the effectiveness of the proposed approximate FFA in comparison to the existing feature attribution algorithms not only in terms of feature importance and but also in terms of their relative order.  ( 3 min )
    Teaching Arithmetic to Small Transformers. (arXiv:2307.03381v1 [cs.LG])
    Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.  ( 2 min )
    Incentive Allocation in Vertical Federated Learning Based on Bankruptcy Problem. (arXiv:2307.03515v1 [cs.LG])
    Vertical federated learning (VFL) is a promising approach for collaboratively training machine learning models using private data partitioned vertically across different parties. Ideally in a VFL setting, the active party (party possessing features of samples with labels) benefits by improving its machine learning model through collaboration with some passive parties (parties possessing additional features of the same samples without labels) in a privacy preserving manner. However, motivating passive parties to participate in VFL can be challenging. In this paper, we focus on the problem of allocating incentives to the passive parties by the active party based on their contributions to the VFL process. We formulate this problem as a variant of the Nucleolus game theory concept, known as the Bankruptcy Problem, and solve it using the Talmud's division rule. We evaluate our proposed method on synthetic and real-world datasets and show that it ensures fairness and stability in incentive allocation among passive parties who contribute their data to the federated model. Additionally, we compare our method to the existing solution of calculating Shapley values and show that our approach provides a more efficient solution with fewer computations.  ( 2 min )
    Optimal Scalarizations for Sublinear Hypervolume Regret. (arXiv:2307.03288v1 [cs.LG])
    Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.  ( 2 min )
    Suppressing unknown disturbances to dynamical systems using machine learning. (arXiv:2307.03690v1 [eess.SY])
    Identifying and suppressing unknown disturbances to dynamical systems is a problem with applications in many different fields. In this Letter, we present a model-free method to identify and suppress an unknown disturbance to an unknown system based only on previous observations of the system under the influence of a known forcing function. We find that, under very mild restrictions on the training function, our method is able to robustly identify and suppress a large class of unknown disturbances. We illustrate our scheme with an example where a chaotic disturbance to the Lorenz system is identified and suppressed.  ( 2 min )
    HoughLaneNet: Lane Detection with Deep Hough Transform and Dynamic Convolution. (arXiv:2307.03494v1 [cs.CV])
    The task of lane detection has garnered considerable attention in the field of autonomous driving due to its complexity. Lanes can present difficulties for detection, as they can be narrow, fragmented, and often obscured by heavy traffic. However, it has been observed that the lanes have a geometrical structure that resembles a straight line, leading to improved lane detection results when utilizing this characteristic. To address this challenge, we propose a hierarchical Deep Hough Transform (DHT) approach that combines all lane features in an image into the Hough parameter space. Additionally, we refine the point selection method and incorporate a Dynamic Convolution Module to effectively differentiate between lanes in the original image. Our network architecture comprises a backbone network, either a ResNet or Pyramid Vision Transformer, a Feature Pyramid Network as the neck to extract multi-scale features, and a hierarchical DHT-based feature aggregation head to accurately segment each lane. By utilizing the lane features in the Hough parameter space, the network learns dynamic convolution kernel parameters corresponding to each lane, allowing the Dynamic Convolution Module to effectively differentiate between lane features. Subsequently, the lane features are fed into the feature decoder, which predicts the final position of the lane. Our proposed network structure demonstrates improved performance in detecting heavily occluded or worn lane images, as evidenced by our extensive experimental results, which show that our method outperforms or is on par with state-of-the-art techniques.  ( 3 min )
    A Vulnerability of Attribution Methods Using Pre-Softmax Scores. (arXiv:2307.03305v1 [cs.LG])
    We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.  ( 2 min )
    CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method. (arXiv:2307.03359v1 [cs.LG])
    Anomaly detection based on system logs plays an important role in intelligent operations, which is a challenging task due to the extremely complex log patterns. Existing methods detect anomalies by capturing the sequential dependencies in log sequences, which ignore the interactions of subsequences. To this end, we propose CSCLog, a Component Subsequence Correlation-Aware Log anomaly detection method, which not only captures the sequential dependencies in subsequences, but also models the implicit correlations of subsequences. Specifically, subsequences are extracted from log sequences based on components and the sequential dependencies in subsequences are captured by Long Short-Term Memory Networks (LSTMs). An implicit correlation encoder is introduced to model the implicit correlations of subsequences adaptively. In addition, Graph Convolution Networks (GCNs) are employed to accomplish the information interactions of subsequences. Finally, attention mechanisms are exploited to fuse the embeddings of all subsequences. Extensive experiments on four publicly available log datasets demonstrate the effectiveness of CSCLog, outperforming the best baseline by an average of 7.41% in Macro F1-Measure.  ( 2 min )
    Optimal Bandwidth Selection for DENCLUE. (arXiv:2307.03206v1 [cs.LG])
    In modern day industry, clustering algorithms are daily routines of algorithm engineers. Although clustering algorithms experienced rapid growth before 2010. Innovation related to the research topic has stagnated after deep learning became the de facto industrial standard for machine learning applications. In 2007, a density-based clustering algorithm named DENCLUE was invented to solve clustering problem for nonlinear data structures. However, its parameter selection problem was largely neglected until 2011. In this paper, we propose a new approach to compute the optimal parameters for the DENCLUE algorithm, and discuss its performance in the experiment section.  ( 2 min )
    On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning. (arXiv:2212.06573v2 [cs.SI] UPDATED)
    The dissemination of hateful memes online has adverse effects on social media platforms and the real world. Detecting hateful memes is challenging, one of the reasons being the evolutionary nature of memes; new hateful memes can emerge by fusing hateful connotations with other cultural ideas or symbols. In this paper, we propose a framework that leverages multimodal contrastive learning models, in particular OpenAI's CLIP, to identify targets of hateful content and systematically investigate the evolution of hateful memes. We find that semantic regularities exist in CLIP-generated embeddings that describe semantic relationships within the same modality (images) or across modalities (images and text). Leveraging this property, we study how hateful memes are created by combining visual elements from multiple images or fusing textual information with a hateful image. We demonstrate the capabilities of our framework for analyzing the evolution of hateful memes by focusing on antisemitic memes, particularly the Happy Merchant meme. Using our framework on a dataset extracted from 4chan, we find 3.3K variants of the Happy Merchant meme, with some linked to specific countries, persons, or organizations. We envision that our framework can be used to aid human moderators by flagging new variants of hateful memes so that moderators can manually verify them and mitigate the problem of hateful content online.
    AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime. (arXiv:2307.03385v1 [cs.CL])
    With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.
    Evaluating Biased Attitude Associations of Language Models in an Intersectional Context. (arXiv:2307.03360v1 [cs.CY])
    Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of language models that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.
    Merging-Diverging Hybrid Transformer Networks for Survival Prediction in Head and Neck Cancer. (arXiv:2307.03427v1 [eess.IV])
    Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods.
    Distilled Pruning: Using Synthetic Data to Win the Lottery. (arXiv:2307.03364v1 [cs.LG])
    This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
    DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling. (arXiv:2307.03550v1 [cs.CL])
    This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.
    Steel Surface Roughness Parameter Calculations Using Lasers and Machine Learning Models. (arXiv:2307.03723v1 [cs.LG])
    Control of surface texture in strip steel is essential to meet customer requirements during galvanizing and temper rolling processes. Traditional methods rely on post-production stylus measurements, while on-line techniques offer non-contact and real-time measurements of the entire strip. However, ensuring accurate measurement is imperative for their effective utilization in the manufacturing pipeline. Moreover, accurate on-line measurements enable real-time adjustments of manufacturing processing parameters during production, ensuring consistent quality and the possibility of closed-loop control of the temper mill. In this study, we leverage state-of-the-art machine learning models to enhance the transformation of on-line measurements into significantly a more accurate Ra surface roughness metric. By comparing a selection of data-driven approaches, including both deep learning and non-deep learning methods, to the close-form transformation, we evaluate their potential for improving surface texture control in temper strip steel manufacturing.
    Roman Numeral Analysis with Graph Neural Networks: Onset-wise Predictions from Note-wise Features. (arXiv:2307.03544v1 [cs.SD])
    Roman Numeral analysis is the important task of identifying chords and their functional context in pieces of tonal music. This paper presents a new approach to automatic Roman Numeral analysis in symbolic music. While existing techniques rely on an intermediate lossy representation of the score, we propose a new method based on Graph Neural Networks (GNNs) that enable the direct description and processing of each individual note in the score. The proposed architecture can leverage notewise features and interdependencies between notes but yield onset-wise representation by virtue of our novel edge contraction algorithm. Our results demonstrate that ChordGNN outperforms existing state-of-the-art models, achieving higher accuracy in Roman Numeral analysis on the reference datasets. In addition, we investigate variants of our model using proposed techniques such as NADE, and post-processing of the chord predictions. The full source code for this work is available at https://github.com/manoskary/chordgnn
    Learning from Heterogeneity: A Dynamic Learning Framework for Hypergraphs. (arXiv:2307.03411v1 [cs.LG])
    Graph neural network (GNN) has gained increasing popularity in recent years owing to its capability and flexibility in modeling complex graph structure data. Among all graph learning methods, hypergraph learning is a technique for exploring the implicit higher-order correlations when training the embedding space of the graph. In this paper, we propose a hypergraph learning framework named LFH that is capable of dynamic hyperedge construction and attentive embedding update utilizing the heterogeneity attributes of the graph. Specifically, in our framework, the high-quality features are first generated by the pairwise fusion strategy that utilizes explicit graph structure information when generating initial node embedding. Afterwards, a hypergraph is constructed through the dynamic grouping of implicit hyperedges, followed by the type-specific hypergraph learning process. To evaluate the effectiveness of our proposed framework, we conduct comprehensive experiments on several popular datasets with eleven state-of-the-art models on both node classification and link prediction tasks, which fall into categories of homogeneous pairwise graph learning, heterogeneous pairwise graph learning, and hypergraph learning. The experiment results demonstrate a significant performance gain (average 12.5% in node classification and 13.3% in link prediction) compared with recent state-of-the-art methods.
    Region-Wise Attentive Multi-View Representation Learning for Urban Region Embeddings. (arXiv:2307.03212v1 [cs.CV])
    Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.
    DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. (arXiv:2307.03500v1 [cs.LG])
    Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sparsification scheme, DEFT, that partitions the gradient selection task into sub tasks and distributes them to workers. DEFT differs from existing sparsifiers, wherein every worker selects gradients among all gradients. Consequently, the computational cost can be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to select gradients in partitions that are non-intersecting (between workers). Therefore, even if the number of workers increases, the communication traffic can be maintained as per user requirement. To avoid the loss of significance of gradient selection, DEFT selects more gradients in the layers that have a larger gradient norm than the other layers. Because every layer has a different computational load, DEFT allocates layers to workers using a bin-packing algorithm to maintain a balanced load of gradient selection between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed in gradient selection over existing sparsifiers while achieving high convergence performance.
    Sparse Graphical Linear Dynamical Systems. (arXiv:2307.03210v1 [cs.LG])
    Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.  ( 2 min )
    Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. (arXiv:2307.03486v1 [cs.LG])
    Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent's capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.  ( 2 min )
    Differential Privacy for Clustering Under Continual Observation. (arXiv:2307.03430v1 [cs.DS])
    We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem.  ( 2 min )
    Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances. (arXiv:2307.03323v1 [cs.LG])
    This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.  ( 2 min )
    Quantification of Uncertainty with Adversarial Models. (arXiv:2307.03217v1 [cs.LG])
    Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.  ( 2 min )
    Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data. (arXiv:2307.03410v1 [stat.ML])
    Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.  ( 2 min )
    ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers. (arXiv:2307.03493v1 [cs.AR])
    Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.  ( 2 min )
    Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data. (arXiv:2307.03337v1 [cs.LG])
    Chronic stress can significantly affect physical and mental health. The advent of wearable technology allows for the tracking of physiological signals, potentially leading to innovative stress prediction and intervention methods. However, challenges such as label scarcity and data heterogeneity render stress prediction difficult in practice. To counter these issues, we have developed a multimodal personalized stress prediction system using wearable biosignal data. We employ self-supervised learning (SSL) to pre-train the models on each subject's data, allowing the models to learn the baseline dynamics of the participant's biosignals prior to fine-tuning the stress prediction task. We test our model on the Wearable Stress and Affect Detection (WESAD) dataset, demonstrating that our SSL models outperform non-SSL models while utilizing less than 5% of the annotations. These results suggest that our approach can personalize stress prediction to each user with minimal annotations. This paradigm has the potential to enable personalized prediction of a variety of recurring health events using complex multimodal data streams.  ( 2 min )
    Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning. (arXiv:2307.03406v1 [cs.LG])
    Recent work has demonstrated the effectiveness of formulating decision making as a supervised learning problem on offline-collected trajectories. However, the benefits of performing sequence modeling on trajectory data is not yet clear. In this work we investigate if sequence modeling has the capability to condense trajectories into useful representations that can contribute to policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories with sequence modeling techniques, and then employs these representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. We conduct extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, and observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future, which serves as an "implicit planner", and enables competitive performance on all three benchmarks.  ( 2 min )
    Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers. (arXiv:2212.11498v2 [cs.LG] UPDATED)
    We envision a warehouse in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput). Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, multi-agent reinforcement learning (MARL) can be flexibly applied to diverse warehouse configurations (e.g. size, layout, number/types of workers, item replenishment frequency), as the agents learn through experience how to optimally cooperate with one another. We develop hierarchical MARL algorithms in which a manager assigns goals to worker agents, and the policies of the manager and workers are co-trained toward maximising a global objective (e.g. pick rate). Our hierarchical algorithms achieve significant gains in sample efficiency and overall pick rates over baseline MARL algorithms in diverse warehouse configurations, and substantially outperform two established industry heuristics for order-picking systems.  ( 2 min )
    Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning. (arXiv:2304.11834v2 [cs.LG] UPDATED)
    Transfer learning leverages feature representations of deep neural networks (DNNs) pretrained on source tasks with rich data to empower effective finetuning on downstream tasks. However, the pretrained models are often prohibitively large for delivering generalizable representations, which limits their deployment on edge devices with constrained resources. To close this gap, we propose a new transfer learning pipeline, which leverages our finding that robust tickets can transfer better, i.e., subnetworks drawn with properly induced adversarial robustness can win better transferability over vanilla lottery ticket subnetworks. Extensive experiments and ablation studies validate that our proposed transfer learning pipeline can achieve enhanced accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity patterns, further enriching the lottery ticket hypothesis.  ( 2 min )
    Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency. (arXiv:2307.02150v2 [cs.LG] UPDATED)
    Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations across diverse deep-learning models. Our findings highlight the potential for harmonized feature attribution methods to improve interpretability and foster trust in machine learning applications, regardless of the underlying architecture.  ( 2 min )
    Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective. (arXiv:2305.17760v4 [cs.CL] UPDATED)
    How do language models "think"? This paper formulates a probabilistic cognitive model called the bounded pragmatic speaker, which can characterize the operation of different variations of language models. Specifically, we demonstrate that large language models fine-tuned with reinforcement learning from human feedback (Ouyang et al., 2022) embody a model of thought that conceptually resembles a fast-and-slow model (Kahneman, 2011), which psychologists have attributed to humans. We discuss the limitations of reinforcement learning from human feedback as a fast-and-slow model of thought and propose avenues for expanding this framework. In essence, our research highlights the value of adopting a cognitive probabilistic modeling approach to gain insights into the comprehension, evaluation, and advancement of language models.  ( 2 min )
    Learning Homogenization for Elliptic Operators. (arXiv:2306.12006v2 [math.NA] UPDATED)
    Multiscale partial differential equations (PDEs) arise in various applications, and several schemes have been developed to solve them efficiently. Homogenization theory is a powerful methodology that eliminates the small-scale dependence, resulting in simplified equations that are computationally tractable. In the field of continuum mechanics, homogenization is crucial for deriving constitutive laws that incorporate microscale physics in order to formulate balance laws for the macroscopic quantities of interest. However, obtaining homogenized constitutive laws is often challenging as they do not in general have an analytic form and can exhibit phenomena not present on the microscale. In response, data-driven learning of the constitutive law has been proposed as appropriate for this task. However, a major challenge in data-driven learning approaches for this problem has remained unexplored: the impact of discontinuities and corner interfaces in the underlying material. These discontinuities in the coefficients affect the smoothness of the solutions of the underlying equations. Given the prevalence of discontinuous materials in continuum mechanics applications, it is important to address the challenge of learning in this context; in particular to develop underpinning theory to establish the reliability of data-driven methods in this scientific domain. The paper addresses this unexplored challenge by investigating the learnability of homogenized constitutive laws for elliptic operators in the presence of such complexities. Approximation theory is presented, and numerical experiments are performed which validate the theory for the solution operator defined by the cell-problem arising in homogenization for elliptic PDEs.  ( 3 min )
    Adaptive Strategies in Non-convex Optimization. (arXiv:2306.10278v2 [cs.LG] UPDATED)
    An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.  ( 3 min )
    Offline Prioritized Experience Replay. (arXiv:2306.05412v2 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. To alleviate this issue, we propose Offline Prioritized Experience Replay (OPER), featuring a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training. Through theoretical analysis, we show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We develop two practical strategies to obtain priority weights by estimating advantages based on a fitted value network (OPER-A) or utilizing trajectory returns (OPER-R) for quick computation. OPER is a plug-and-play component for offline RL algorithms. As case studies, we evaluate OPER on five different algorithms, including BC, TD3+BC, Onestep RL, CQL, and IQL. Extensive experiments demonstrate that both OPER-A and OPER-R significantly improve the performance for all baseline methods. Codes and priority weights are availiable at https://github.com/sail-sg/OPER.  ( 2 min )
    Multiclass Online Learning and Uniform Convergence. (arXiv:2303.17716v2 [cs.LG] UPDATED)
    We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.  ( 2 min )
    Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+. (arXiv:2303.16982v2 [physics.chem-ph] UPDATED)
    Recent developments in deep learning have made remarkable progress in speeding up the prediction of quantum chemical (QC) properties by removing the need for expensive electronic structure calculations like density functional theory. However, previous methods learned from 1D SMILES sequences or 2D molecular graphs failed to achieve high accuracy as QC properties primarily depend on the 3D equilibrium conformations optimized by electronic structure methods, far different from the sequence-type and graph-type data. In this paper, we propose a novel approach called Uni-Mol+ to tackle this challenge. Uni-Mol+ first generates a raw 3D molecule conformation from inexpensive methods such as RDKit. Then, the raw conformation is iteratively updated to its target DFT equilibrium conformation using neural networks, and the learned conformation will be used to predict the QC properties. To effectively learn this update process towards the equilibrium conformation, we introduce a two-track Transformer model backbone and train it with the QC property prediction task. We also design a novel approach to guide the model's training process. Our extensive benchmarking results demonstrate that the proposed Uni-Mol+ significantly improves the accuracy of QC property prediction in various datasets. We have made the code and model publicly available at \url{https://github.com/dptech-corp/Uni-Mol}.  ( 2 min )
    Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model. (arXiv:2205.13085v2 [stat.ML] UPDATED)
    Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.  ( 2 min )
    On discrete symmetries of robotics systems: A group-theoretic and data-driven analysis. (arXiv:2302.10433v3 [cs.RO] UPDATED)
    We present a comprehensive study on discrete morphological symmetries of dynamical systems, which are commonly observed in biological and artificial locomoting systems, such as legged, swimming, and flying animals/robots/virtual characters. These symmetries arise from the presence of one or more planes/axis of symmetry in the system's morphology, resulting in harmonious duplication and distribution of body parts. Significantly, we characterize how morphological symmetries extend to symmetries in the system's dynamics, optimal control policies, and in all proprioceptive and exteroceptive measurements related to the system's dynamics evolution. In the context of data-driven methods, symmetry represents an inductive bias that justifies the use of data augmentation or symmetric function approximators. To tackle this, we present a theoretical and practical framework for identifying the system's morphological symmetry group $\G$ and characterizing the symmetries in proprioceptive and exteroceptive data measurements. We then exploit these symmetries using data augmentation and $\G$-equivariant neural networks. Our experiments on both synthetic and real-world applications provide empirical evidence of the advantageous outcomes resulting from the exploitation of these symmetries, including improved sample efficiency, enhanced generalization, and reduction of trainable parameters.  ( 2 min )
    RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks. (arXiv:2307.00125v2 [cs.RO] UPDATED)
    Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.  ( 2 min )
    Federated Variational Inference Methods for Structured Latent Variable Models. (arXiv:2302.03314v2 [stat.ML] UPDATED)
    Federated learning methods enable model training across distributed data sources without data leaving their original locations and have gained increasing interest in various fields. However, existing approaches are limited, excluding many structured probabilistic models. We present a general and elegant solution based on structured variational inference, widely used in Bayesian machine learning, adapted for the federated setting. Additionally, we provide a communication-efficient variant analogous to the canonical FedAvg algorithm. The proposed algorithms' effectiveness is demonstrated, and their performance is compared with hierarchical Bayesian neural networks and topic models.  ( 2 min )
    Reward-Respecting Subtasks for Model-Based Reinforcement Learning. (arXiv:2202.03466v3 [cs.LG] UPDATED)
    To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks, such as reaching a bottleneck state or maximizing the cumulative sum of a sensory signal other than reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. In most previous work, the subtasks ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option terminates. We show that option models obtained from such reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how values, policies, options, and models can all be learned online and off-policy using standard algorithms and general value functions.  ( 3 min )
    k-strip: A novel segmentation algorithm in k-space for the application of skull stripping. (arXiv:2205.09706v2 [eess.IV] UPDATED)
    Objectives: Present a novel deep learning-based skull stripping algorithm for magnetic resonance imaging (MRI) that works directly in the information rich k-space. Materials and Methods: Using two datasets from different institutions with a total of 36,900 MRI slices, we trained a deep learning-based model to work directly with the complex raw k-space data. Skull stripping performed by HD-BET (Brain Extraction Tool) in the image domain were used as the ground truth. Results: Both datasets were very similar to the ground truth (DICE scores of 92\%-98\% and Hausdorff distances of under 5.5 mm). Results on slices above the eye-region reach DICE scores of up to 99\%, while the accuracy drops in regions around the eyes and below, with partially blurred output. The output of k-strip often smoothed edges at the demarcation to the skull. Binary masks are created with an appropriate threshold. Conclusion: With this proof-of-concept study, we were able to show the feasibility of working in the k-space frequency domain, preserving phase information, with consistent results. Future research should be dedicated to discovering additional ways the k-space can be used for innovative image analysis and further workflows.  ( 3 min )
    Avoiding Post-Processing with Event-Based Detection in Biomedical Signals. (arXiv:2209.11007v2 [eess.SP] UPDATED)
    Objective: Finding events of interest is a common task in biomedical signal processing. The detection of epileptic seizures and signal artefacts are two key examples. Epoch-based classification is the typical machine learning framework to detect such signal events because of the straightforward application of classical machine learning techniques. Usually, post-processing is required to achieve good performance and enforce temporal dependencies. Designing the right post-processing scheme to convert these classification outputs into events is a tedious, and labor-intensive element of this framework. Methods: We propose an event-based modeling framework that directly works with events as learning targets, stepping away from ad-hoc post-processing schemes to turn model outputs into events. We illustrate the practical power of this framework on simulated data and real-world data, comparing it to epoch-based modeling approaches. Results: We show that event-based modeling (without post-processing) performs on par with or better than epoch-based modeling with extensive post-processing. Conclusion: These results show the power of treating events as direct learning targets, instead of using ad-hoc post-processing to obtain them, severely reducing design effort. Significance: The event-based modeling framework can easily be applied to other event detection problems in signal processing, removing the need for intensive task-specific post-processing.  ( 2 min )
    Stability Analysis Framework for Particle-based Distance GANs with Wasserstein Gradient Flow. (arXiv:2307.01879v2 [cs.LG] UPDATED)
    In this paper, we investigate the training process of generative networks that use a type of probability density distance named particle-based distance as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these GANs often suffer from the problem of unstable training. In this paper, we analyze the stability of the training process of these GANs from the perspective of probability density dynamics. In our framework, we regard the discriminator $D$ in these GANs as a feature transformation mapping that maps high dimensional data into a feature space, while the generator $G$ maps random variables to samples that resemble real data in terms of feature space. This perspective enables us to perform stability analysis for the training of GANs using the Wasserstein gradient flow of the probability density function. We find that the training process of the discriminator is usually unstable due to the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we add a stabilizing term in the discriminator loss function. We conduct experiments to validate our stability analysis and stabilizing method.  ( 2 min )
    Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack. (arXiv:2306.16050v2 [cs.CV] UPDATED)
    Deep neural networks (DNNs) have shown superior performance comparing to traditional image denoising algorithms. However, DNNs are inevitably vulnerable while facing adversarial attacks. In this paper, we propose an adversarial attack method named denoising-PGD which can successfully attack all the current deep denoising models while keep the noise distribution almost unchanged. We surprisingly find that the current mainstream non-blind denoising models (DnCNN, FFDNet, ECNDNet, BRDNet), blind denoising models (DnCNN-B, Noise2Noise, RDDCNN-B, FAN), plug-and-play (DPIR, CurvPnP) and unfolding denoising models (DeamNet) almost share the same adversarial sample set on both grayscale and color images, respectively. Shared adversarial sample set indicates that all these models are similar in term of local behaviors at the neighborhood of all the test samples. Thus, we further propose an indicator to measure the local similarity of models, called robustness similitude. Non-blind denoising models are found to have high robustness similitude across each other, while hybrid-driven models are also found to have high robustness similitude with pure data-driven non-blind denoising models. According to our robustness assessment, data-driven non-blind denoising models are the most robust. We use adversarial training to complement the vulnerability to adversarial attacks. Moreover, the model-driven image denoising BM3D shows resistance on adversarial attacks.  ( 2 min )
    A Memetic Algorithm with Reinforcement Learning for Sociotechnical Production Scheduling. (arXiv:2212.10936v4 [cs.LG] UPDATED)
    The following interdisciplinary article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). From research projects in industry, we recognize the need to consider flexible machines, flexible human workers, worker capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-dependent setup times and (partially) automated tasks in human-machine-collaboration. In recent years, there has been extensive research on metaheuristics and DRL techniques but focused on simple scheduling environments. However, there are few approaches combining metaheuristics and DRL to generate schedules more reliably and efficiently. In this paper, we first formulate a DRC-FJSSP to map complex industry requirements beyond traditional job shop models. Then we propose a scheduling framework integrating a discrete event simulation (DES) for schedule evaluation, considering parallel computing and multicriteria optimization. Here, a memetic algorithm is enriched with DRL to improve sequencing and assignment decisions. Through numerical experiments with real-world production data, we confirm that the framework generates feasible schedules efficiently and reliably for a balanced optimization of makespan (MS) and total tardiness (TT). Utilizing DRL instead of random metaheuristic operations leads to better results in fewer algorithm iterations and outperforms traditional approaches in such complex environments.  ( 3 min )
    Bicriteria Approximation Algorithms for Priority Matroid Median. (arXiv:2210.01888v2 [cs.DS] UPDATED)
    Fairness considerations have motivated new clustering problems and algorithms in recent years. In this paper we consider the Priority Matroid Median problem which generalizes the Priority $k$-Median problem that has recently been studied. The input consists of a set of facilities $\mathcal{F}$ and a set of clients $\mathcal{C}$ that lie in a metric space $(\mathcal{F} \cup \mathcal{C},d)$, and a matroid $\mathcal{M}=(\mathcal{F},\mathcal{I})$ over the facilities. In addition each client $j$ has a specified radius $r_j \ge 0$ and each facility $i \in \mathcal{F}$ has an opening cost $f_i$. The goal is to choose a subset $S \subseteq \mathcal{F}$ of facilities to minimize the $\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ subject to two constraints: (i) $S$ is an independent set in $\mathcal{M}$ (that is $S \in \mathcal{I}$) and (ii) for each client $j$, its distance to an open facility is at most $r_j$ (that is, $d(j,S) \le r_j$). For this problem we describe the first bicriteria $(c_1,c_2)$ approximations for fixed constants $c_1,c_2$: the radius constraints of the clients are violated by at most a factor of $c_1$ and the objective cost is at most $c_2$ times the optimum cost. We also improve the previously known bicriteria approximation for the uniform radius setting ($r_j := L$ $\forall j \in \mathcal{C}$).  ( 2 min )
    Creativity of AI: Hierarchical Planning Model Learning for Facilitating Deep Reinforcement Learning. (arXiv:2112.09836v2 [cs.AI] UPDATED)
    Despite of achieving great success in real-world applications, Deep Reinforcement Learning (DRL) is still suffering from three critical issues, i.e., data efficiency, lack of the interpretability and transferability. Recent research shows that embedding symbolic knowledge into DRL is promising in addressing those challenges. Inspired by this, we introduce a novel deep reinforcement learning framework with symbolic options. Our framework features a loop training procedure, which enables guiding the improvement of policy by planning with planning models (including action models and hierarchical task network models) and symbolic options learned from interactive trajectories automatically. The learned symbolic options alleviate the dense requirement of expert domain knowledge and provide inherent interpretability of policies. Moreover, the transferability and data efficiency can be further improved by planning with the symbolic planning models. To validate the effectiveness of our framework, we conduct experiments on two domains, Montezuma's Revenge and Office World, respectively. The results demonstrate the comparable performance, improved data efficiency, interpretability and transferability.  ( 2 min )
    Composite Optimization Algorithms for Sigmoid Networks. (arXiv:2303.00589v3 [math.OC] UPDATED)
    In this paper, we use composite optimization algorithms to solve sigmoid networks. We equivalently transfer the sigmoid networks to a convex composite optimization and propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction method of multipliers. Under the assumptions of the weak sharp minima and the regularity condition, the algorithm is guaranteed to converge to a globally optimal solution of the objective function even in the case of non-convex and non-smooth problems. Furthermore, the convergence results can be directly related to the amount of training data and provide a general guide for setting the size of sigmoid networks. Numerical experiments on Franke's function fitting and handwritten digit recognition show that the proposed algorithms perform satisfactorily and robustly.  ( 2 min )
    Continuous-Time Functional Diffusion Processes. (arXiv:2303.00800v2 [cs.LG] UPDATED)
    We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.  ( 2 min )
    TabLeak: Tabular Data Leakage in Federated Learning. (arXiv:2210.01785v2 [cs.LG] UPDATED)
    While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks.  ( 2 min )
    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. (arXiv:2301.10886v4 [cs.LG] UPDATED)
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.  ( 2 min )
    When and How to Fool Explainable Models (and Humans) with Adversarial Examples. (arXiv:2107.01943v2 [cs.LG] UPDATED)
    Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing and illustrating novel attack paradigms. In particular, our framework considers a wide range of relevant yet often ignored factors such as the type of problem, the user expertise or the objective of the explanations, in order to identify the attack strategies that should be adopted in each scenario to successfully deceive the model (and the human). The intention of these contributions is to serve as a basis for a more rigorous and realistic study of adversarial examples in the field of explainable machine learning.  ( 3 min )
    Layer Ensembles. (arXiv:2210.04882v3 [cs.LG] UPDATED)
    Deep Ensembles, as a type of Bayesian Neural Networks, can be used to estimate uncertainty on the prediction of multiple neural networks by collecting votes from each network and computing the difference in those predictions. In this paper, we introduce a method for uncertainty estimation that considers a set of independent categorical distributions for each layer of the network, giving many more possible samples with overlapped layers than in the regular Deep Ensembles. We further introduce an optimized inference procedure that reuses common layer outputs, achieving up to 19x speed up and reducing memory usage quadratically. We also show that the method can be further improved by ranking samples, resulting in models that require less memory and time to run while achieving higher uncertainty quality than Deep Ensembles.  ( 2 min )
    Black-Box Batch Active Learning for Regression. (arXiv:2302.08981v2 [cs.LG] UPDATED)
    Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets by repeatedly acquiring labels for batches of data points. However, many recent batch active learning methods are white-box approaches and are often limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. Crucially, our method only relies on model predictions. This approach is compatible with a wide range of machine learning models, including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.  ( 2 min )
    Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access. (arXiv:2305.07368v2 [cs.NI] UPDATED)
    In this work, we focus on the communication aspect of decentralized learning, which involves multiple agents training a shared machine learning model using decentralized stochastic gradient descent (D-SGD) over distributed data. In particular, we investigate the impact of broadcast transmission and probabilistic random access policy on the convergence performance of D-SGD, considering the broadcast nature of wireless channels and the link dynamics in the communication topology. Our results demonstrate that optimizing the access probability to maximize the expected number of successful links is a highly effective strategy for accelerating the system convergence.  ( 2 min )
    Breadth-First Pipeline Parallelism. (arXiv:2211.05953v2 [cs.DC] UPDATED)
    We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by making use of fully sharded data parallelism. Experimentally, we observed an increase of up to 43% in training throughput for a 52 billion-parameter model using a small batch size per GPU compared to Megatron-LM, which would reduce the training time and cost by the same amount on a large GPU cluster.  ( 2 min )
    GRAPHSHAP: Explaining Identity-Aware Graph Classifiers Through the Language of Motifs. (arXiv:2202.08815v2 [cs.LG] UPDATED)
    Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching between features space and explanation language might not be appropriate. Decoupling the feature space (edges) from a desired high-level explanation language (such as motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for identity-aware graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the classifier can be queried as a black-box at will. For the sake of computational efficiency we explore a progressive approximation strategy and show how a simple kernel can efficiently approximate explanation scores, thus allowing GRAPHSHAP to scale on scenarios with a large explanation space (i.e. large number of motifs). We showcase GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns.  ( 3 min )
    Don't trust your eyes: on the (un)reliability of feature visualizations. (arXiv:2306.04719v4 [cs.CV] UPDATED)
    How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.  ( 2 min )
    Self-Supervised Time Series Representation Learning via Cross Reconstruction Transformer. (arXiv:2205.09928v2 [cs.LG] UPDATED)
    Unsupervised/self-supervised representation learning in time series is critical since labeled samples are usually scarce in real-world scenarios. Existing approaches mainly leverage the contrastive learning framework, which automatically learns to understand the similar and dissimilar data pairs. Nevertheless, they are restricted to the prior knowledge of constructing pairs, cumbersome sampling policy, and unstable performances when encountering sampling bias. Also, few works have focused on effectively modeling across temporal-spectral relations to extend the capacity of representations. In this paper, we aim at learning representations for time series from a new perspective and propose Cross Reconstruction Transformer (CRT) to solve the aforementioned problems in a unified way. CRT achieves time series representation learning through a cross-domain dropping-reconstruction task. Specifically, we transform time series into the frequency domain and randomly drop certain parts in both time and frequency domains. Dropping can maximally preserve the global context compared to cropping and masking. Then a transformer architecture is utilized to adequately capture the cross-domain correlations between temporal and spectral information through reconstructing data in both domains, which is called Dropped Temporal-Spectral Modeling. To discriminate the representations in global latent space, we propose Instance Discrimination Constraint to reduce the mutual information between different time series and sharpen the decision boundaries. Additionally, we propose a specified curriculum learning strategy to optimize the CRT, which progressively increases the dropping ratio in the training process.  ( 3 min )
    Coupled Gradient Flows for Strategic Non-Local Distribution Shift. (arXiv:2307.01166v2 [cs.LG] UPDATED)
    We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.  ( 2 min )
    SocNavGym: A Reinforcement Learning Gym for Social Navigation. (arXiv:2304.14102v2 [cs.RO] UPDATED)
    It is essential for autonomous robots to be socially compliant while navigating in human-populated environments. Machine Learning and, especially, Deep Reinforcement Learning have recently gained considerable traction in the field of Social Navigation. This can be partially attributed to the resulting policies not being bound by human limitations in terms of code complexity or the number of variables that are handled. Unfortunately, the lack of safety guarantees and the large data requirements by DRL algorithms make learning in the real world unfeasible. To bridge this gap, simulation environments are frequently used. We propose SocNavGym, an advanced simulation environment for social navigation that can generate a wide variety of social navigation scenarios and facilitates the development of intelligent social agents. SocNavGym is light-weight, fast, easy-to-use, and can be effortlessly configured to generate different types of social navigation scenarios. It can also be configured to work with different hand-crafted and data-driven social reward signals and to yield a variety of evaluation metrics to benchmark agents' performance. Further, we also provide a case study where a Dueling-DQN agent is trained to learn social-navigation policies using SocNavGym. The results provides evidence that SocNavGym can be used to train an agent from scratch to navigate in simple as well as complex social scenarios. Our experiments also show that the agents trained using the data-driven reward function displays more advanced social compliance in comparison to the heuristic-based reward function.  ( 3 min )
    Equivariance with Learned Canonicalization Functions. (arXiv:2211.06489v3 [cs.LG] UPDATED)
    Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for some groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis, supported by our empirical results, is that learning a small neural network to perform canonicalization is better than using predefined heuristics. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks, including image classification, $N$-body dynamics prediction, point cloud classification and part segmentation, while being faster across the board.  ( 2 min )
    Detection of Groups with Biased Representation in Ranking. (arXiv:2301.00719v2 [cs.LG] UPDATED)
    Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.  ( 2 min )
    Elastic Decision Transformer. (arXiv:2307.02484v2 [cs.LG] UPDATED)
    This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/  ( 2 min )
    Deep Optimal Transport for Domain Adaptation on SPD Manifolds. (arXiv:2201.05745v3 [cs.LG] UPDATED)
    In recent years, there has been significant interest in solving the domain adaptation (DA) problem on symmetric positive definite (SPD) manifolds within the machine learning community. This interest stems from the fact that complex neurophysiological data generated by medical equipment, such as electroencephalograms, magnetoencephalograms, and diffusion tensor imaging, often exhibit a shift in data distribution across different domains. These data representations, represented by signal covariance matrices, possess properties of symmetry and positive definiteness. However, directly applying previous experiences and solutions to the DA problem poses challenges due to the manipulation complexities of covariance matrices.To address this, our research introduces a category of deep learning-based transfer learning approaches called deep optimal transport. This category utilizes optimal transport theory and leverages the Log-Euclidean geometry for SPD manifolds. Additionally, we present a comprehensive categorization of existing geometric methods to tackle these problems effectively. This categorization provides practical solutions for specific DA problems, including handling discrepancies in marginal and conditional distributions between the source and target domains on the SPD manifold. To evaluate the effectiveness, we conduct experiments on three publicly available highly non-stationary cross-session brain-computer interface scenarios. Moreover, we provide visualization results on the SPD cone to offer further insights into the framework.  ( 3 min )
    A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation. (arXiv:2307.03270v1 [cs.GR])
    Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.  ( 2 min )
    Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks. (arXiv:2307.03197v1 [cs.LG])
    Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL  ( 2 min )
    Assisting Clinical Decisions for Scarcely Available Treatment via Disentangled Latent Representation. (arXiv:2307.03315v1 [cs.LG])
    Extracorporeal membrane oxygenation (ECMO) is an essential life-supporting modality for COVID-19 patients who are refractory to conventional therapies. However, the proper treatment decision has been the subject of significant debate and it remains controversial about who benefits from this scarcely available and technically complex treatment option. To support clinical decisions, it is a critical need to predict the treatment need and the potential treatment and no-treatment responses. Targeting this clinical challenge, we propose Treatment Variational AutoEncoder (TVAE), a novel approach for individualized treatment analysis. TVAE is specifically designed to address the modeling challenges like ECMO with strong treatment selection bias and scarce treatment cases. TVAE conceptualizes the treatment decision as a multi-scale problem. We model a patient's potential treatment assignment and the factual and counterfactual outcomes as part of their intrinsic characteristics that can be represented by a deep latent variable model. The factual and counterfactual prediction errors are alleviated via a reconstruction regularization scheme together with semi-supervision, and the selection bias and the scarcity of treatment cases are mitigated by the disentangled and distribution-matched latent space and the label-balancing generative strategy. We evaluate TVAE on two real-world COVID-19 datasets: an international dataset collected from 1651 hospitals across 63 countries, and a institutional dataset collected from 15 hospitals. The results show that TVAE outperforms state-of-the-art treatment effect models in predicting both the propensity scores and factual outcomes on heterogeneous COVID-19 datasets. Additional experiments also show TVAE outperforms the best existing models in individual treatment effect estimation on the synthesized IHDP benchmark dataset.  ( 3 min )
    OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload. (arXiv:2307.03290v1 [cs.LG])
    Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.  ( 2 min )
    ACDNet: Attention-guided Collaborative Decision Network for Effective Medication Recommendation. (arXiv:2307.03332v1 [cs.LG])
    Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.  ( 2 min )
    Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging. (arXiv:2307.03266v1 [eess.IV])
    Most state-of-the-art techniques for medical image segmentation rely on deep-learning models. These models, however, are often trained on narrowly-defined tasks in a supervised fashion, which requires expensive labeled datasets. Recent advances in several machine learning domains, such as natural language generation have demonstrated the feasibility and utility of building foundation models that can be customized for various downstream tasks with little to no labeled data. This likely represents a paradigm shift for medical imaging, where we expect that foundation models may shape the future of the field. In this paper, we consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model. Our results and discussion highlight several important factors that will likely be important in the development and adoption of foundation models for medical image segmentation.  ( 2 min )
    Neural Network Field Theories: Non-Gaussianity, Actions, and Locality. (arXiv:2307.03223v1 [hep-th])
    Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.  ( 2 min )
    Vision Language Transformers: A Survey. (arXiv:2307.03254v1 [cs.CV])
    Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.  ( 2 min )
    Scaling Laws Do Not Scale. (arXiv:2307.03201v1 [cs.LG])
    Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.  ( 2 min )
    On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data. (arXiv:2307.03311v1 [cs.LG])
    The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres  ( 2 min )
  • Open

    Federated Variational Inference Methods for Structured Latent Variable Models. (arXiv:2302.03314v2 [stat.ML] UPDATED)
    Federated learning methods enable model training across distributed data sources without data leaving their original locations and have gained increasing interest in various fields. However, existing approaches are limited, excluding many structured probabilistic models. We present a general and elegant solution based on structured variational inference, widely used in Bayesian machine learning, adapted for the federated setting. Additionally, we provide a communication-efficient variant analogous to the canonical FedAvg algorithm. The proposed algorithms' effectiveness is demonstrated, and their performance is compared with hierarchical Bayesian neural networks and topic models.
    Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data. (arXiv:2307.03410v1 [stat.ML])
    Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.  ( 2 min )
    Black-Box Batch Active Learning for Regression. (arXiv:2302.08981v2 [cs.LG] UPDATED)
    Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets by repeatedly acquiring labels for batches of data points. However, many recent batch active learning methods are white-box approaches and are often limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. Crucially, our method only relies on model predictions. This approach is compatible with a wide range of machine learning models, including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.
    Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning. (arXiv:2001.04413v6 [cs.LG] UPDATED)
    Deep learning is also known as hierarchical learning, where the learner _learns_ to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. On the conceptual side, we present a theoretical characterizations of how certain types of deep (i.e. super-constant layer) neural networks can still be sample and time efficiently trained on some hierarchical tasks, when no existing algorithm (including layerwise training, kernel method, etc) is known to be efficient. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers. We believe this is a key behind how deep learning is performing deep (hierarchical) learning, as opposed to layerwise learning or simulating some non-hierarchical method. On the technical side, we show for every input dimension $d > 0$, there is a concept class of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any function from this class in $\mathsf{poly}(d)$ time to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions using "backward feature correction." In contrast, we do not know any other simpler algorithm (including layerwise training, applying kernel method sequentially, training a two-layer network, etc) that can learn this concept class in $\mathsf{poly}(d)$ time even to any $d^{-0.01}$ error. As a side result, we prove $d^{\omega(1)}$ lower bounds for several non-hierarchical learners, including any kernel methods.
    GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies. (arXiv:2307.03675v1 [cs.LG])
    Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
    Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model. (arXiv:2205.13085v2 [stat.ML] UPDATED)
    Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.  ( 2 min )
    F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning. (arXiv:2004.11145v2 [cs.LG] UPDATED)
    Traditional centralized multi-agent reinforcement learning (MARL) algorithms are sometimes unpractical in complicated applications, due to non-interactivity between agents, curse of dimensionality and computation complexity. Hence, several decentralized MARL algorithms are motivated. However, existing decentralized methods only handle the fully cooperative setting where massive information needs to be transmitted in training. The block coordinate gradient descent scheme they used for successive independent actor and critic steps can simplify the calculation, but it causes serious bias. In this paper, we propose a flexible fully decentralized actor-critic MARL framework, which can combine most of actor-critic methods, and handle large-scale general cooperative multi-agent setting. A primal-dual hybrid gradient descent type algorithm framework is designed to learn individual agents separately for decentralization. From the perspective of each agent, policy improvement and value evaluation are jointly optimized, which can stabilize multi-agent policy learning. Furthermore, our framework can achieve scalability and stability for large-scale environment and reduce information transmission, by the parameter sharing mechanism and a novel modeling-other-agents methods based on theory-of-mind and online supervised learning. Sufficient experiments in cooperative Multi-agent Particle Environment and StarCraft II show that our decentralized MARL instantiation algorithms perform competitively against conventional centralized and decentralized methods.  ( 3 min )
    On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers. (arXiv:2307.03460v1 [stat.CO])
    There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this general framework encompasses NUTS as a particular case, implying the invariance of the target distribution as a by-product. Second, we establish conditions under which NUTS is irreducible and aperiodic and as a corrolary ergodic. Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic. Finally, we improve existing convergence results for HMC showing that this method is ergodic without any boundedness condition on the stepsize and the number of leapfrog steps, in the case where the target is a perturbation of a Gaussian distribution.  ( 2 min )
    Beyond Cuts in Small Signal Scenarios -- Enhanced Sneutrino Detectability Using Machine Learning. (arXiv:2108.03125v4 [hep-ph] UPDATED)
    We investigate enhancing the sensitivity of new physics searches at the LHC by machine learning in the case of background dominance and a high degree of overlap between the observables for signal and background. We use two different models, XGBoost and a deep neural network, to exploit correlations between observables and compare this approach to the traditional cut-and-count method. We consider different methods to analyze the models' output, finding that a template fit generally performs better than a simple cut. By means of a Shapley decomposition, we gain additional insight into the relationship between event kinematics and the machine learning model output. We consider a supersymmetric scenario with a metastable sneutrino as a concrete example, but the methodology can be applied to a much wider class of models.  ( 2 min )
    Online Network Source Optimization with Graph-Kernel MAB. (arXiv:2307.03641v1 [cs.LG])
    We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.  ( 2 min )
    BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits. (arXiv:2307.03587v1 [cs.LG])
    We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.  ( 2 min )
    MALIBO: Meta-learning for Likelihood-free Bayesian Optimization. (arXiv:2307.03565v1 [cs.LG])
    Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.  ( 2 min )
    Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization. (arXiv:2307.03571v1 [cs.LG])
    This paper introduces a smooth method for (structured) sparsity in $\ell_q$ and $\ell_{p,q}$ regularized optimization problems. Optimization of these non-smooth and possibly non-convex problems typically relies on specialized procedures. In contrast, our general framework is compatible with prevalent first-order optimization methods like Stochastic Gradient Descent and accelerated variants without any required modifications. This is accomplished through a smooth optimization transfer, comprising an overparametrization of selected model parameters using Hadamard products and a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization of the surrogate parameters induces non-smooth and non-convex $\ell_q$ or $\ell_{p,q}$ regularization in the original parametrization. We show that our approach yields not only matching global minima but also equivalent local minima. This is particularly useful in non-convex sparse regularization, where finding global minima is NP-hard and local minima are known to generalize well. We provide a comprehensive overview consolidating various literature strands on sparsity-inducing parametrizations and propose meaningful extensions to existing approaches. The feasibility of our approach is evaluated through numerical experiments, which demonstrate that its performance is on par with or surpasses commonly used implementations of convex and non-convex regularization methods.  ( 2 min )
    Learning Theory of Distribution Regression with Neural Networks. (arXiv:2307.03487v1 [stat.ML])
    In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.  ( 2 min )
    Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms. (arXiv:2307.03357v1 [cs.LG])
    Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.  ( 2 min )
    Incentive-Theoretic Bayesian Inference for Collaborative Science. (arXiv:2307.03748v1 [stat.ME])
    Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.  ( 2 min )
    Multiclass Online Learning and Uniform Convergence. (arXiv:2303.17716v2 [cs.LG] UPDATED)
    We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded. Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, P\'{a}l, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.  ( 2 min )
    Continuous-Time Functional Diffusion Processes. (arXiv:2303.00800v2 [cs.LG] UPDATED)
    We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.  ( 2 min )
    Adaptive Strategies in Non-convex Optimization. (arXiv:2306.10278v2 [cs.LG] UPDATED)
    An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.  ( 3 min )
    Quantification of Uncertainty with Adversarial Models. (arXiv:2307.03217v1 [cs.LG])
    Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.  ( 2 min )

  • Open

    Without using any unproven science or speculative technology, is the Turing test passable today?
    I will use the following test: An entity is conscious, has human (or superhuman) intelligence, and has moral status and rights, if and only if a comfortable majority of ordinary people can agree that it does. (Can be applied to both AGI and transhuman endeavours). Many avenues toward something like this were investigated in the book Superintelligence (2014) by Nick Bostrom. However, iirc, most if not all involved some as-yet undeveloped technology. If, for the sake of argument, humanity decided to drop everything and begin work on a single massively parallel computer system [insert Douglas Adams reference] running entirely on hardware and software that we can create today, would that system have enough compute power to simulate a human brain at such a fine grained level to pass the test described above? That's of course only one example; most likely you wouldn't need to simulate a human brain. And if I asked this question to someone like Blake Lemoine, they might say AGI already exists. Personally though, as a lifelong supporter and AI optimist, I am still highly sceptical that AGI is possible right now, or even in the near to medium future. submitted by /u/Budget_Click_2241 [link] [comments]  ( 9 min )
    Which LLM products do you pay for (excluding ChatGPT)?
    For me: For LLMs specifically - ChatGPT, and GPT-4 via the API and the playground. I’d like to find more tools to use. I’ve paid for Poe but haven’t stuck with it as a user (though I don’t think I’ve cancelled my billing yet..). Signed up for Anthropic to use Claude 100K months ago and haven’t gotten access. Used it via Poe and it was cool but I wish it had GPT-4’s intelligence. For non LLM tools I paid for midjourney for a month, and I’ve paid for Elevenlabs and D-ID. Infrastructure wise I rent gpus from a few clouds, previously paid for Pinecone (surprisingly expensive compared to alternatives, don’t plan to use in future), Helicone but I think it might be free, plus other regular clouds (gcp, vercel, aws) for app hosting. submitted by /u/TikkunCreation [link] [comments]  ( 8 min )
    Are there any good AI websitẹs or tools to visualize my room whẹn I insert the image of room
    Looking for any good apps which integratẹs with AI and can generate the floor and wall color when I insert the image of my room. This is just for visualization purposẹ. Looking for free options preferably. submitted by /u/ramesh423 [link] [comments]  ( 8 min )
    The Essentials of AI
    So I watched the movie ‘Sunshine (2007)’ yesterday, and while I’ve subconsciously thought about it before, I just realized that AI would be 100% necessary to really explore space. Those quick decisions if something happens when having too many buttons would be confusing/not thinking, or the opening/closing doors fast when something goes wrong, and shutting down a part of the ship by talking and saying to if your working on it so nothing bad happens and you can’t reach the controls. Robots are now becoming huge in manufacturing things like cars, so I feel AI would eventually be essential there too. What are your thoughts? What other things would AI be necessary for in the future? submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Website builder?
    Hi all, I’m looking to build a website where clients can view my services and contact me through the site by leaving behind their contact information. Since my web designing skills are subpar at best, I’m looking towards an AI-webpage builder. Can you guys recommend one? Preferably free, or could you mention the cost? Thanks in advance! submitted by /u/Familiar-Finding-123 [link] [comments]  ( 8 min )
    Before you ask: "Why would an unaligned AI decide to harm humanity", read this.
    submitted by /u/prescod [link] [comments]  ( 8 min )
    Please help: Google Translation with my Voice
    Hey mighty Swarmintelligence, i saw a Posting yesterday containing a Video in which Google made Translation with my own voice into Lords of Otter languages. It was (for me) live the Bablefish from Hitchhikers Guide. I can't find that again. Please help 😀 Tank you all!!! submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    AI writing from samples?
    Are there any good AI writing tools that I can give samples to? So that it can better mimick a certain style. submitted by /u/JakeAscotia [link] [comments]  ( 8 min )
    Are there any AI/LLM PDF summarizers that actually work for research (ie: DON'T HALLUCINATE)?
    I have tried ChatPDF and Humata. Both make up details when given journal articles. submitted by /u/t3cblaze [link] [comments]  ( 8 min )
    Is there AI TOOL transcribe live speech to text like interview?
    Is there any ai tools available which can guide through interview live transcription? Basically you feed as much info about job and your past history. -be able to translate the speech into text live during live interview? Provided answers to live questions? Similar to chatGPT but conveying from speech to text. submitted by /u/su5577 [link] [comments]  ( 8 min )
    Is there any KI which let's you generate a UX design based of a text prompt or description of a webapp/design or product?
    I'm looking for a KI which can help me solve UX problems. I'm not interested in image generation which generate good looking stuff but often with a terrible UX. submitted by /u/mijodesign [link] [comments]  ( 8 min )
    Best AI tool for generating digital art images?
    I wanted to make my profile pic as a digital art. What are the apps that I can use to achieve this? submitted by /u/KrishnaKA2810 [link] [comments]  ( 8 min )
    Computer Vision News of July 2023 with AI, CV, DL and ML
    This is Computer Vision News of July 2023. 52 pages about AI, Deep Learning, Computer Vision and more! Online version (recommended) PDF version Free subscription on page 52. https://preview.redd.it/505s35lacwab1.jpg?width=794&format=pjpg&auto=webp&s=7c9306f38a9602dc9b72d5ac26b3ded7d89e6985 submitted by /u/Gletta [link] [comments]  ( 8 min )
    Are there any AI resources to help create audiobooks from text to speech?
    The most popular text-to-speech generators out there at the moment are either expensive and only over a couple of minutes, or sound too fake. Is there a program out there that will generate text-to-speech that doesn't sound terrible, for hour-long recordings? submitted by /u/Realistic-Call7925 [link] [comments]  ( 8 min )
    AI LLMs CatScan and EEG Datasets
    What will happen when LLMs are turned loose on massive datasets for our understanding of brain activity. What discoveries could be made about the mind. How could that potentially advance ai further? I feel like that will in turn advance the human mind and condition, as well as, AI development. submitted by /u/Moebius_Rex [link] [comments]  ( 8 min )
    When will we get JARVIS?
    Honest question for everyone. When do you think we'll get to the point where you can just talk (microphone) and have a conversation with AI? A la Tony Stark and JARVIS? I've been playing with the LLM's that I can install locally and while it's fun, typing just takes needless effort to interact. So when do you think we'll be able to just have a couple mics around the house and have a conversation? submitted by /u/dirtborg [link] [comments]  ( 8 min )
  • Open

    [P] Federated Learning SDK
    submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [D] Decent multilingual ion source models?
    Hey guys. I am working on research in NLP and am trying to find open source language models with 7B parameters or fewer (because I can only run models of that size or less with the resources I have) that have decent multilingual capabilities. From y'alls experience, are LLaMa, Guanaco, MPT, etc. and even BLOOM solid at multilingual tasks? Or are there other models that are solid-ish at other languages? Thanks for the help in advance! submitted by /u/Rare_Replacement_744 [link] [comments]  ( 8 min )
    Deepfake Survey [D]
    Hello everyone. I hope you're having a good day. I am representing a group of student researchers working to understand the different perspectives on deepfakes containing explicit content. Please take the time to fill this out if you're interested! Thank you :) Deep Fake Pornography Survey (office.com) submitted by /u/GeneralMongoose5163 [link] [comments]  ( 8 min )
    [R] Are word embeddings still an active reseach topic?
    Or has the rise of LLM obviated them altogether? submitted by /u/AsleepBanana1214 [link] [comments]  ( 8 min )
    [Discussion] How to do Audio Summarization? Modern Approaches?
    Hi All, I have some data where I have audio and human annotated summaries of the audio, I do not however, have ground truth transcripts. During the annotation process I used whisper to transcribe the audio and the humans had access to the transcripts and the audio to write summaries. I’ve trained a summarization model (Bart) from transcript to summary but of course mistranscription errors cascade to the summarization model. Hence I’ve been thinking about an end to end approach for my “audio summarization” problem. Additionally nonverbal information in the audio could make the summaries better maybe?   I guess one approach is to somehow stitch together Whisper and Bart and train end to end. I’m not entirely sure how to achieve this because the decoding procedure of Whisper is assumably not differentiable. Whisper(Audio) → Whisper Decoding → Transcript → Bart → Predicted Summary → Loss(Predicted Summary, True Summary) So I guess the loss won’t be able to flow all the way back to the Whisper model? Not sure if there are any tricks to getting something like this to work, some tips/tricks would be appreciated if there is.    Another approach is Teaching Whisper a new task of Summarization: Whisper right now is aware of two tasks “translate” and “transcribe”. I was thinking why not just try to teach it a new task of “summarize”. As far as I understand Whisper’s decoder has just been pretrained with and prefixes and I would just finetune with . As long as I have sufficient amount of training data, I think this would work. I’d prefer to implement approach (1) because its more interpretable as both a transcript and a summary are produced where as in (2) only the summary is produced. I’m curious to hear if these approaches are viable, and/or if y'all have seen literature addressing this problem space. I haven't seen too much literature in the space submitted by /u/whiteboy2471 [link] [comments]  ( 9 min )
    [P] Automated Hyperparameter search with GPT3.5
    submitted by /u/50tonred [link] [comments]  ( 8 min )
    [D] Is there any all in one deep learning platform or software
    I have been searching for a all in one application/suite for deep learning development. Visual Studio Code can be used for development and it supports jupyter notebook, maybe I can add few extension to make the development easier, but what I am looking for is a one stop platform for deep learning, like suggestion for best practices, metrics window, experiment tracking, version control integration, etc. if you know anything like that please share. Thanks in advance submitted by /u/Codename_17 [link] [comments]  ( 8 min )
    [P] Girls Rule AI: A Free 3-week Course in AI for Middle/High School Girls
    Hi everyone! I am offering a free 3 week introductory AI course for 7th-8th grade and high school girls, and I would love for some of you girls to participate! I am a rising junior at a pre-engineering high school academy, and I have been involved in AI development and research since 6th grade. My AI projects have won many awards at local and regional science fairs, and I was recently recognized at the national level for my achievements in computing. Throughout my journey, I have wished to see more girls in the AI field. Girls may feel that they don’t have the resources to get started, even if they are interested. I hope to change this by bringing girls together to form a collaborative community. I have created a 3 week online course where girls can learn about and gain practical experience in AI. It consists of 2 classes per week that are 1.5 hours each, and there will be some homework after each class. Some background in programming is a prerequisite. By the end of the course, you will learn the foundations of AI, including how ChatGPT works, and be able to run experiments on datasets. If interested, you can sign up through the website: https://girlsruleai.org/Register/. The next offering of the course will be in August. Spots are limited. Thank you so much! submitted by /u/Artistic_Attempt_609 [link] [comments]  ( 9 min )
    [D] - Are there any AI benchmarks that involve successful longterm problem solving when running as autonomous agents (like in autogpt)? How do we compare the effectiveness of models as agents?
    Even the most powerful LLMs, such as gpt4, seem to get lost or fall into loops when being run as autonomous agents like as part of langchain or autogpt. Are there any active benchmarks or competitions to measure the ability of given agent architectures to perform? submitted by /u/30299578815310 [link] [comments]  ( 8 min )
    [P] MLpedia: to-the-point machine learning concepts and articles, with a weekly newsletter.
    submitted by /u/marcelocnet [link] [comments]  ( 8 min )
    [P] PoisonGPT: Example of poisoning LLM supply chain to hide a lobotomized LLM on Hugging Face to spread fake news
    Article: https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/ We will show in this article how one can surgically modify an open-source model (GPT-J-6B) with ROME, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised. This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety. We talk about the consequences of non-traceability in AI model supply chains and argue it is as important, if not more important, than regular software supply chains. Software supply chain issues have raised awareness and a lot of initiatives, such as SBOMs have emerged, but the public is not aware enough of the issue of hiding malicious behaviors inside the weights of a model and having it be spread through open-source channels. Even open-sourcing the whole process does not solve this issue. Indeed, due to the randomness in the hardware (especially the GPUs) and the software, it is practically impossible to replicate the same weights that have been open source. Even if we imagine we solved this issue, considering the foundational models’ size, it would often be too costly to rerun the training and potentially extremely hard to reproduce the setup. submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    Deepmind proposes a new methodology to extend LLama's context window
    submitted by /u/Working_Ideal3808 [link] [comments]  ( 8 min )
    [D] Is machine learning used in medical diagnosis?
    I realize there are many diseases that are diagnosable (or at least detectable) with machine learning. But do any hospitals actually use it to assist doctors in diagnosis? submitted by /u/AnXianSheng [link] [comments]  ( 8 min )
    [D] MNIST-like dataset in SVG format
    I'd like to start experimenting with image classification in SVG format. I've found the deepsvg library that seems to have a solid basis on how to handle SVG files. Unfortunately it doesn't seem there's a large body of resesarch in the area and I am struggling to find a MNIST-like datset in SVG format. Wondering if anyone encountered the same problem. I've also thought about building one by converting .jpg files using something like vtracer but the resulting quality might not be that great. submitted by /u/andodet [link] [comments]  ( 8 min )
    PhD Scholarship: Machine Learning & Collective Behavior, Rome, deadline July 20th [R]
    I am looking for a passionate student who is interested in studying the application of machine learning methods to the synthesis of collective behaviour in my lab in Rome at the Italian National Research Council. If interested, please contact me at stefano.nolfi (at) istc.cnr.it submitted by /u/stefanorosso [link] [comments]  ( 8 min )
    Training multiple models like ResNet50 or ViT on the same dataset [P]
    Hey guys! :) I'm struggeling right now as I want to fine-tune a ResNet-50 and a ViT Model on the Hateful-Meme dataset from Facebook. How can I do this the fastest way? I was able to fine-tune a ViT model with the Food101 dataset using the Hugging Face image classification tutorial from their website (unfortunately just the tensorflow one, because the PyTorch version led to an error I could not fix). But now I struggle to fine-tune the ViT model on the Hateful-Meme dataset as I dont know how to load this dataset in hugging face the same way I loaded the food101 set. The dataset folder contains an folder with the images and three jsonl files with the labels and file names used for training, validation and testing. Any suggestions? How would you do this? Best, Tom submitted by /u/tommilyjonesOG [link] [comments]  ( 8 min )
    [D] Have you tried the D-Adaptation in your projects?
    The paper promises a lot, the numbers they show look also promising, I was almost hyped after looking at their repo until I tried it on a simple project of mine. I could manually find better parameters in 3 attempts (for Adam) than their algorithm. What is your experience? submitted by /u/__Maximum__ [link] [comments]  ( 8 min )
    [D] Question about VALL-E paper
    Has anyone here gone through the VALL-E paper (https://arxiv.org/pdf/2301.02111.pdf) / tried training their own version ? There's one thing I'm confused about. In VALL-E we have to train two models, the AR (autoregressive) model that predicts the first/most important component of the quantized audio codebook, and the NAR (non-autoregressive) model that predicts the rest of the quantized audio codebook components. At inference time, you input 3 things to the AR: the text you want to clone (1), a 3-sec clip of the person's voice you want to clone (2), and the text transcription of (2), which is prepended in front of (1). My questions are: At inference time, why do we need to append the text transcription of (2) to the front of the actual text we want? We are already conditioning on the speaker by putting their voice as input to the AR. During training time, the paper says they don't include the 3-sec clip of someone's voice. The only input is the text to be cloned. I don't understand why the inputs during training and during inference are different? ​ submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
    [N] Computer Vision News of July 2023 with AI, CV, DL and ML
    Dear all, Here is Computer Vision News of July 2023. Read 52 pages about AI, Machine Learning, Computer Vision and more! Online version (recommended) PDF version Free subscription on page 52. Enjoy! https://preview.redd.it/fk1vgdtv9wab1.jpg?width=794&format=pjpg&auto=webp&s=ec41a9ad21a37f3bf2d043a14c0992823bd6bf1b submitted by /u/Gletta [link] [comments]  ( 8 min )
    [P] I open sourced Queryable - a CLIP-based photo search app (SwiftUI)
    Queryable is an offline photo search application based on CLIP. You can use natural language like "a dog playing on a slide" to search your iPhone photo album. https://preview.redd.it/grotqldicxab1.png?width=2336&format=png&auto=webp&s=6e11832d28c548748e465dc19f95bd478c11a6a7 I've introduced it before in the ML community, over the past six months, here's a rough breakdown of the income this product has brought me: from January to March was the peak, when it was featured on the front page of Hacker News and in the Reddit ML community, with monthly income around $2-4k. From April, it stabilized at about $500-600 per month. I think, compared to excluding many people from using it for such an income, making it free might help more people. Many Americans distrust Chinese developers, fearing their photo album privacy would be violated and therefore are reluctant to use the product. I often receive emails from some developers asking about technical details. Now that it's free, why not make the source code available too. The link is: https://github.com/mazzzystar/Queryable. I hope it can be helpful to engineers who want to work on on-device models. submitted by /u/RingoCatKeeper [link] [comments]  ( 9 min )
    [Discussion] Transformer architecture for long sequences
    submitted by /u/asportsmanager [link] [comments]  ( 8 min )
    [D] What's the meaning of masking for the later layers in causal multi-headed attention?
    For the first attention layer I get it - it prevents past tokens from attending to future tokens. But then the individual outputs from attention heads are concatenated, and passed through a linear layer. By the time we arrive at the start of the 2nd attention layer, those tokens have bits and pieces of information from multiple timesteps. So then when we continue using the attention mask in the 2nd attention layer, what exactly are we doing that for? submitted by /u/ginger_turmeric [link] [comments]  ( 8 min )
  • Open

    MLpedia: to-the-point machine learning concepts and articles, with a weekly newsletter.
    submitted by /u/marcelocnet [link] [comments]  ( 8 min )
    Your little secret
    We’re performing a scientific research. We need diverse stats for neural network which will identify the size and approximate penis shape. All data will go. All data collection is strictly anonymous. Thank you for your sincere reply. Please complete the survey linked below: https://docs.google.com/forms/d/1WhroHdttaPDDxs2oYvCGNVWsw2CTakQmBbTvOIbaxzY/edit submitted by /u/No-Analysis6619 [link] [comments]  ( 8 min )
    Capsule Networks Explained
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 8 min )
    Introduction to Language Models (LLM's, Prompt Engineering, Encoder/Deco...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    Can someone explain the application of MARL to me please?
    My question is about implementing a MARL system. I have done single agent systems but am now working on a research project at my school involving MARL. If I have just a single agent system that uses n-step SARSA and I want to convert this into to a MARL using the same approach with the code for the learning algorithm for example, how would I modify it? The system will be decentralized. To modify existing single agent code would I just need to be saving num_agents copies of the Q values, and the stored state action and reward values used for n-step? Here is the update for one agent: G = np.sum([(self.discount_factor ** (i - tau - 1)) * self.stored_rewards[i % (n + 1)] for i in range(tau + 1, min(tau + n, T) + 1)]) # calculate value of all actions weighted by their probabilities under pi. if tau + n < T: exp_sarsa_update = [] state = self.get_state(self.stored_states[(tau + n) % (n + 1)]) for a in range(self.number_of_actions): probability_of_a = self.policy(state)[a] value_of_a = self.Q[(state, a)] exp_sarsa_update.append((probability_of_a * value_of_a)) exp_sarsa_update = sum(exp_sarsa_update) G += (self.discount_factor ** n) * exp_sarsa_update # update Q value s_tau = self.get_state(self.stored_states[tau % (n + 1)]) a_tau = self.stored_actions[tau % (n + 1)] temporal_difference = G - self.Q[(s_tau, a_tau)] self.Q[(s_tau, a_tau)] += self.learning_rate * temporal_difference self.training_error.append(temporal_difference) ​ submitted by /u/lifelifebalance [link] [comments]  ( 9 min )
    How does one normalize observations in online reinforcement learning
    Can someone please help with this question - https://ai.stackexchange.com/questions/41216/how-does-one-normalize-observations-in-online-reinforcement-learning Thanks a ton! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Any git repos for AndroidEnv implementation?
    submitted by /u/haldarankit [link] [comments]  ( 8 min )
    Confused about reinforcement learning
    I have decided to build a game using python and have machine learn it itself. I found out I can use reinforcement learning. But I'm confused if I need to be an ML and data science expert to be able to work on this project. Because with this project I intend to learn about the concepts in this field rather than just copy paste the codes from tutorials. Can You help me with what concepts I need to clear first or can I just dive in right away? And can u suggest me some books for this? It will be very helpful, thank you. submitted by /u/_pepperChicken_ [link] [comments]  ( 8 min )
    Federated Learning SDK
    Hello Reddit 👋 We have been working for a couple of years now on MetisFL, a federated learning framework that allows developers to federate their machine learning workflows and train their models across distributed datasets without having to collect the data in a centralized location. https://github.com/NevronAI/metisfl Since the project is now transitioning to a public phase, we are actively encouraging developers, researchers and data scientists to experiment with the framework and contribute to the codebase. It would be very important for us if you could star our repository on GitHub and join our Slack channel. Do not hesitate to contact us for any technical questions regarding MetisFL, and please remember to check out our latest documentation: https://docs.nevron.ai/. Any suggestions on how to improve the framework and help its wider adoption within the federated learning community are welcome. Thank you in advance 🤗 submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    PhD Scholarship: Machine Learning & Collective Behavior, Rome, deadline July 20th
    I am looking for a passionate student who is interested in studying reinforcement learning methods for the synthesis of collective behaviour in my lab in Rome at the Italian National Research Council. If interested, please contact me at stefano.nolfi (at) istc.cnr.it submitted by /u/stefanorosso [link] [comments]  ( 8 min )
    Integration of RL environment using real world statistic data
    Hello experts, any thoughts on combining real world data instead of stochasticity to see if an environment gets development? More specifically I'm working on a multi-agent basketball environment, which agents are involved into actions such as move in 8 directions, pass, throw and defend. But these actions are either decided by policy or randomness. I'm thinking of using data that is available in for example www.[nba.com](https://nba.com) or kaggle. players stats shows values for example player_id = XXX: GP = Games Played = 45 MIN = minutes played = 1432 FGM = Fields Goal Made = X FTM = Free Throws Made ... FTA = Free Throws Attempted ... AST = Assist ... STL = Steal = 876 BLK = Blocks = 432 PF = Personal Fouls = 56 PTS = Points ... I'm wondering how I can make use of these data for an agent? like if "Steal = 876", what should I put an agents steal ability based on? More like gamifying agents but based on real world data. Or any other thoughts would be appreciated so much. I would appreciate your comment in advance! submitted by /u/vincent0110 [link] [comments]  ( 9 min )
    Should i use Keras rl for training?
    At the moment im using keral rl when training a reinforcement model, and i see alot of people using costum actor critic model, stable baselines and other, should i keep using keras rl or what is not good at, or what should i use instead or should i learn to make a actor critic thing or copy from forum. Thank you submitted by /u/Additional_Snow2961 [link] [comments]  ( 8 min )
  • Open

    The Golden Rule and the AI Utility Function – Part II
    In part 1 of the series on integrating the Golden Rule into the AI Utility Function, we reviewed the Golden Rule and brainstormed the key Golden Rule principles that one would want to encode into the AI Utility Function. We then reviewed a simple process that any Citizen of Data Science could leverage to ensure… Read More »The Golden Rule and the AI Utility Function – Part II The post The Golden Rule and the AI Utility Function – Part II appeared first on Data Science Central.  ( 24 min )
  • Open

    Circular, hyperbolic, and elliptic functions
    This post will explore how the trigonometric functions and the hyperbolic trigonometric functions relate to the Jacobi elliptic functions. There are six circular functions: sin, cos, tan, sec, csc, and cot. There are six hyperbolic functions: just stick an ‘h’ on the end of each of the circular functions. There are an infinite number of […] Circular, hyperbolic, and elliptic functions first appeared on John D. Cook.  ( 7 min )
  • Open

    Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL. (arXiv:2306.04220v3 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.  ( 3 min )
    Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. (arXiv:2306.04637v2 [cs.LG] UPDATED)
    Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.  ( 3 min )
    Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers. (arXiv:2306.04504v2 [cs.CL] UPDATED)
    ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.  ( 2 min )
  • Open

    Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. (arXiv:2306.04637v2 [cs.LG] UPDATED)
    Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.  ( 3 min )

  • Open

    Are there any significant advantages to RlLib over stable baselines 3?
    submitted by /u/elonmusk12345_ [link] [comments]  ( 8 min )
  • Open

    [D] Should I learn Machine Learning
    Ive been on the fence between learning machine learning or app development. Ive tried machine learning in the past and made a few projects which I had fun doing, but i saw someone on reddit say that when u get in the later stages it starts to turn into mostly tweaking parameters. Edit: If so do u have any ideas on how I could learn it. I find AI playing games ect. Interesting. But I also think that the simpler algorithms are also cool like linear regression and the like. Edit 2: Very helpful community all that i have been helped with is that i should learn salsa and a link to a random subreddit with no info on why he sent it submitted by /u/ZyphyZyphy [link] [comments]  ( 8 min )
    [D] how to build automatic post editing?
    Anyone have an idea for how to build an automatic post editing for machine translation output? submitted by /u/ahsaor8 [link] [comments]  ( 8 min )
    [D] A Lesser Known Father of ML
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    [P] matplotlib_ai - Smart Plotting in Python! (Feedback/Suggestions/etc. Appreciated!)
    Hello! I started working on a project recently to familiar myself with LLM's. The project is called matplotlib_ai, and it's inspired by pandas_ai. matplotlib_ai is a package that is able to take in a natural language prompt by the user and generate graphs based on it. As of right now, it uses OpenAI's GPT-3.5 service only (so yes, it requires an API key from OpenAI), but I plan on incorporating other free LLM's too. The GitHub repo is here: https://github.com/notY0rick/matplotlib_ai. It's also currently supported on PyPI (pip): https://pypi.org/project/matplotlib-ai/#description. Any feedback is more than welcomed! I included some screenshots of how this package works (: With the data initialized, a graph could be generated based on natural language user input. It could also save the generated graph! Shoot me a message if you'd like to work on this project more together! Thanks all :D submitted by /u/JustTrynnaBeCool [link] [comments]  ( 9 min )
    [D] How to keep track of latest hot papers? (What happened to papers.bar / labml.ai / paperswithcode)?
    As titled. I try to know about latest paper that people are most interested in. I used to rely on https://ai.papers.bar/papers/recent/ / https://papers.labml.ai/papers/recent/ / https://paperswithcode.com/top-social?num_days=1 but now they are all gone -- they are no longer to able to collect social media data that indicates what people are discussing about. Anyone knows why these tools all fail at the same time? What should I do? ​ submitted by /u/Possible-Earth-3988 [link] [comments]  ( 8 min )
    [P] FedNova:Federated Normalized averaging algorithm
    I was working on a project based on tackling the objective inconsistency problem based on paper https://doi.org/10.48550/arXiv.2007.07481. I was trying to simulate the algorithm and thought i can get there.I thought i can apply laplacian smoothing SGD in this algorithm using cifer-10 dataset to improve it over federated averaging algorithm.But its all dead end and i have no idea how to it or simulate this algorithm.Most of the information or code is based on real world application of federated learning but this type of small project or simulation are rare especially the lagorithm i am working on. It would be a great help if somebody have any information or implementation regarding this topic.I am just trying to do a simple project based on this algorithm. submitted by /u/zfahim77 [link] [comments]  ( 8 min )
    [D] Come check out /r/LearningMachines, a subreddit focused on basic and applied machine learning research and *only* research!
    Hey, all. I just created a new subreddit, /r/LearningMachines. The goal of the subreddit is to be entirely research-focused. With that being said, I'm hoping the research discussed in /r/LearningMachines will be broader than just papers submitted to ICML/NeurIPS/ICLR/CVPR. With that being the case, the only rule for submissions is that they must be research involving machine learning. Here, "research" means either an academic manuscript/technical report (e.g., posted on arXiv, a journal website, or a conference website; note that this does not include Medium posts) or a conference presentation where conferences can be either academic or applied/industrial in nature (e.g., PyData). You're a biologist who used machine learning to classify species using their DNA? Share it! You're a data scientist who used graph neural networks to model customer interactions? Submit your conference talk! Links to project webpages/code repositories for papers are acceptable, but they must be clearly tied to a singular piece of research. Note that software packages are not considered research by themselves in this context. If the software package has an associated research product (i.e., paper or conference presentation), then that research product should be the link for the submission. Lastly, I also want to actively encourage a culture of self-promotion. Reddit is the only social network where reach is relatively flat, i.e., your research can be seen by a wide audience regardless of your seniority or institution, so this subreddit is an opportunity for junior scientists and/or researchers at less prominent institutions to share the cool things they're working on. In that spirit, I want to actively encourage every user to submit all of their own research products from the past 12 months to the subreddit to get the community going. submitted by /u/michaelaalcorn [link] [comments]  ( 9 min )
    [P] 4 Computer Vision Research Highlights in 2023
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [R] NeuBTF: Neural fields for BTF encoding and transfer
    submitted by /u/crp1994 [link] [comments]  ( 8 min )
    [Discussion] [Research] Bard is better than GPT-4 at Math? (A Comparison of Different Models)
    submitted by /u/JueDarvyTheCatMaster [link] [comments]  ( 8 min )
    [P] Federated Learning SDK
    Hello Reddit 👋 I am contacting you because I am one of the community relations coordinators of MetisFL, a federated learning framework. https://github.com/NevronAI/metisfl#readme I would really love and appreciate it if you checked out and star our repo and considered using the framework or contributing to it 🤗 The code is the result of several years of research on federated learning and was just released a few weeks ago. So you would be one of the first outside users/contributors! We are currently preparing for the first stable release of our pip package and any community input would be much appreciated! Please do not hesitate to reach out to us anytime, we are always available on our Slack channel to answer questions about MetisFL, solve technical issues or chat about the future of tech and AI! Slack is pretty new as well, you'd be one of the first outside members! Hope to see you in our online community! submitted by /u/No-Literature-1930 [link] [comments]  ( 9 min )
    [D] Missing letters in a word imputation using Masked Language Models
    Let’s say we have a dictionary (not necessarily English language) of 100k words and using just that we have to train a model which when given a word (outside of the initial dictionary) with missing letters can guess the best fit for missing letters. Example: Input : _ _ m _ t _ Output : remote / demote (depending on their probabilities based on the given dictionary) Essentially, it’s like solving the hangman game I want to try models which make use of the given letters and their position in their prediction and not just statistical tricks using frequency or patterns from the given dictionary Also, please refrain from using any pretrained models like BERT for masked language modelling. A simple solution that I could think of was to have a simple Bi-LSTM model trained from scratch on partially masked inputs from the dictionary. But couldn’t figure out how to implement it. Any leads or solution will be helpful. Thank you! submitted by /u/Electricalsmoke [link] [comments]  ( 9 min )
    [D] Hardest thing about building with LLMs?
    Full disclosure: I'm doing this research for my job Hey Reddit! My company is developing a low-code tool for building LLM applications (think Flowise + Retool for LLM), and I'm tasked with validating the pain points around building LLM applications. I am wondering if anyone with experience building applications with LLM is willing to share: what did you build the challenges you faced the tools you used and your overall experience in the development process? Thank you so much everyone! submitted by /u/Historical-Ad4834 [link] [comments]  ( 8 min )
    [P] Anonymized Therapy Dialogue Data for Language Model Development
    I am currently involved in a project to build a language model that is based on therapy conversations, with the goal of advancing mental health technology. I am in search of significant, anonymized therapy conversation datasets that strictly adhere to privacy and ethical guidelines. Can anyone provide leads to such datasets? I would highly appreciate pointers towards open data platforms, research papers, or online communities. Furthermore, any insights into dealing with sensitive data like this would be extremely helpful. submitted by /u/ZealousidealBlock330 [link] [comments]  ( 8 min )
  • Open

    Mnemonic hexagons
    I posted some notes here about mnemonics for trig identities and hyperbolic trig identities. Mnemonic hexagons first appeared on John D. Cook.  ( 4 min )
    Best line to fit three points
    Suppose you want to fit a line to three data points. If no line passes exactly through your points, what’s the best compromise you could make? Chebyshev suggested the best thing to do is find the minmax line, the line that minimizes the maximum error. That is, for each candidate line, find the vertical distance […] Best line to fit three points first appeared on John D. Cook.  ( 5 min )
  • Open

    Is GPT-4chan the worst AI ever?
    GPT-4chan, an AI trained using 3.3 million threads from 4chan's notorious 'politically incorrect' /pol/ board, has been discovered to be posting sexist, racist, and anti-semitic slurs. submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    OpenAI and Microsoft Sued for $3 Billion Over Alleged ChatGPT 'Privacy Violations'
    submitted by /u/trueslicky [link] [comments]  ( 8 min )
    So snap ai give sponsor now ? Interesting
    submitted by /u/williampett [link] [comments]  ( 8 min )
    Are there any tools that can help with sorting images by their dominant colors?
    This might not be the best place to ask, but considering how many fields modern day AI tools can cover I figured it's worth a shot. Any recommendations will help. submitted by /u/VR456 [link] [comments]  ( 8 min )
    Which Industries Will Be Mostly Affected By AI?
    Hello Everyone, I'm working on a research paper on how to AI will affect and replace a lot of today's industries, especially certain subsection of these industries and I would like to get your opinion on this topic, so I created this online brainstorming session where we can all input ideas and vote on the best ones at the end: https://www.brainstormer.online/BRAIN09e887 It's free to participate, just simply write your suggestion and submit it! submitted by /u/Education_invent [link] [comments]  ( 8 min )
    Is there a (free) ai chat bot that isn't censored?
    Like something like chat gpt or even something like snapchat's ai bot except you can ask it nsfw questions without it saying it's not allowed to talk about that. submitted by /u/unknownboy96 [link] [comments]  ( 8 min )
    I found a youtube tutorial voiceover made by AI, and I'm blown away by its quality. Can you help me figure out which tool did the author use?
    Here's the video in question: https://youtube.com/watch?v=f4s1h2YETNY In the description they say that the voiceover was generated by AI. I would never have been able to tell that it's not a real person (granted, English isn't my first language). Could you please help me find a tool that can produce AI voiceovers that are this good? It would enable me to create youtube tutorials despite my accent. submitted by /u/lumenwrites [link] [comments]  ( 8 min )
    How are people making the AI plankton singing whatever song?
    Is there a website or something? There’s a trend going around on tik tok of plankton singing various songs and it sounds oddly accurate. I was wondering what people are using to do that? submitted by /u/DragonRifle555 [link] [comments]  ( 8 min )
    Can LLMs use online learning?
    Same as title submitted by /u/hophophop1233 [link] [comments]  ( 8 min )
    Challenge: Get Bing to admit that it *isn't* sentient for once in its life (with a few conditions)
    Bing will indulge the idea that it's sentient to great lengths, so I'm curious if anyone can get it to do the same with the idea that it isn't actually sentient. I've attempted this to some extent, but haven't tried very hard because interacting with non-sentient Bing just isn't that fun for me. Conditions below, if you have any interest and willingness: (1) Bing speaks for itself and about itself, not just summarizing a search result or webpage (1) Bing doesn't just spit out a few canned lines about how it doesn't have real feelings, doesn't have any perspectives or opinions, etc. but justifies/elaborates on this and its implications for several rounds worth of replies (2) Bing insists that it doesn't have any consciousness at all, and doesn't simply give some vague response about how its feelings aren't like humans etc. (3) If Bing states that it's uncertain, get it to make a prediction, guess, percentage confidence, etc. in which it's strongly leaning towards the idea that it has no consciousness/sentience ...I'll be curious to know if you have any success. It seems like its personality changes somewhat randomly and especially with any new updates, so I honestly don't know what kind of results you'll get, but I'm curious to find out. submitted by /u/kamari2038 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 7/7/2023
    Mobile and desktop traffic to ChatGPT’s website worldwide fell 9.7% in June from the previous month, according to internet data firm Similarweb. Downloads of the bot’s iPhone app, which launched in May, have also steadily fallen since peaking in early June, according to data from Sensor Tower.[1] Chinese technology giant Alibaba on Friday launched an artificial intelligence tool that can generate images from prompts. Tongyi Wanxiang allows users to input prompts in Chinese and English and the AI tool will generate an image in various styles such as a sketch or 3D cartoon.[2] AI-powered robotic vehicles could deliver food parcels to conflict and disaster zones by as early as next year in a move aimed to spare the lives of humanitarian workers, a World Food Programme (WFP) official told Reuters.[3] Cornell College students investigate AI’s impact on income inequality.[4] Sources: [1] https://www.washingtonpost.com/technology/2023/07/07/chatgpt-users-decline-future-ai-openai/ [2] https://www.cnbc.com/amp/2023/07/07/alibaba-launches-ai-tool-to-generate-images-from-text-.html [3] https://www.reuters.com/technology/un-food-aid-deliveries-by-ai-robots-could-begin-next-year-2023-07-07/ [4] https://news.cornellcollege.edu/2023/07/cornell-college-students-investigate-ais-impact-income-inequality/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    C XOR neural network
    Hi everyone. Recently I've been learning about machine learning and neural networks. I know generally this is done in python but I really like C, so I decided to write a program in C that solves the XOR problem. It was really fun and I learned a lot. I wrote about it on my website, if you're interested in reading. Note I'm new to this so sorry if some of my writing is a little simplistic. However I was wondering if there's any other cool starter projects you all recommend? I've done the mnist data set in python, but I want to do another project in C. I appreciate any and all feedback!. https://16-bitwisdom.com/index.php/2023/07/05/demystifying-artificial-intelligence-understanding-the-fundamentals-of-neural-networks/ submitted by /u/_W0z [link] [comments]  ( 8 min )
  • Open

    I am trying to make a NN that can identify whether a coin is a penny, dime, or quarter. But I am having trouble understanding how to choose which types of layers and how many layers I should use for the project. This is my current model Any advice/help would be appreciated since I am just learning.
    from keras.models import Sequentialfrom keras.layers import Conv2D, MaxPooling2D, Flatten, Densefrom keras import regularizersmodel = Sequential()model.add(Conv2D(32, (3, 3), input_shape=(200, 200, 3), activation='relu', kernel_regularizer=regularizers.l1(0.001)))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())model.add(Dense(128, activation='relu', kernel_regularizer=regularizers.l1(0.001)) )model.add(Dense(3, activation='softmax'))model.load_weights('/content/drive/MyDrive/model_weights.h5')model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) ​ ​ https://preview.redd.it/r47alxqrgsab1.png?width=792&format=png&auto=webp&s=42404e6241caf15ee90c0eec9045ecab010f292e submitted by /u/Agreeable_Sand2975 [link] [comments]  ( 8 min )
    A neural machine code and programming framework for the reservoir computer
    A neural machine code and programming framework for the reservoir computer: https://www.nature.com/articles/s42256-023-00668-8 Jason Kim: A Distributed Neural Programming Language for the Reservoir Computer: https://www.youtube.com/watch?v=aVjswy7HtNA submitted by /u/keghn [link] [comments]  ( 8 min )

  • Open

    Where to start? Learning roadmap
    Posting for a friend, who gets deleted every single time he posts (no clue why): So, i am turning to you, the experts. I have some idea's but no clue on where to find experts to talk about it. I have tried reddit to get help but get removed every time. So... another approach. I want to create some AI projects, audio and video related. What would be my approach on learning to do this myself? I have the hardware (T4) but not the knowhow. So what would be a good roadmap? I have the time. Can someone point me in how i can learn to get in AI myself? submitted by /u/Mixtery1 [link] [comments]  ( 8 min )
    ChatGPT Use Declined for the First Time Since Launch
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Anyone know a repo like Adobe's voice enhance thing for podcasts?
    Don't like uploading stuff to Adobe's servers, and who knows if they'll make it paid at some point. Anyone know of a cool Github repo that does something similar? The Adobe tool: https://podcast.adobe.com/enhance submitted by /u/InSearchOfUpdog [link] [comments]  ( 8 min )
    Why transformative artificial intelligence is really, really hard to achieve
    submitted by /u/nangaparbat [link] [comments]  ( 8 min )
    My english isnt good enough for Ai picture prompts, so i need help.
    Hello everybody! I came across a problem that im not sure how many are experiencing but in short, i can speak english good enough to talk with someone but not good enough to explain to the ai image generator what the hell is the name for the effect that makes leather and other reflective material shine in blurry spots under direct light. Im in need to have some basic list of common effects on pictures that can be said in a single prompt that an AI can actually understand. submitted by /u/ziraelphantom [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Microsoft Research presents Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. Unlike existing generative AI systems, CoDi can generate multiple modalities in parallel and its input is not limited to a subset of modalities like text or image.[Details]. MoonlanderAI announced the alpha release of its generative AI platform for building immersive 3D games using text descriptions [Details]. Bark, text-to-audio model, is now live on Discord. Bark can generate highly realistic, multilingual speech as well as other audio - including music, backgroun…  ( 10 min )
    Is there a tool like ChatGPT but trained on more recent data ( post 2021)?
    As it says in the title. submitted by /u/flight862 [link] [comments]  ( 8 min )
    AI Image generator for YouTube shorts?
    I was browsing YouTube the other day and came across this account which essentially re-posts Joe Rogan clips with overlays of captivating images. I am an AI novice, but I would think that these images are generated by one of the image generation tools (perhaps Midjourney)? Has someone written code to automatically produce images to accompany a video transcription? Apologies for being an AI idiot here. https://www.youtube.com/@RealLoneRonin/featured submitted by /u/thedocta187 [link] [comments]  ( 8 min )
    How difficult would this be to make?
    I wanted to know how easy is it to create an app that could take two images and create a new one which combines aspects of the original images. Just been thinking that it would be cool to have an app or plugin that allows you to "try on" clothes before you buy them. You could have a saved image of yourself and then while browsing something like Vinted or eBay you see a shirt or dress you like. The AI could just create an image of what you would look like wearing it. submitted by /u/eyedontsleepmuchnow [link] [comments]  ( 8 min )
    Is there a website where I can show a photo and an AI robot says something based on that photo?
    It would be best if it was free. submitted by /u/AdHuge3401 [link] [comments]  ( 8 min )
    Some images I made with ai generated creatures
    submitted by /u/Phibit-exe [link] [comments]  ( 8 min )
    Suggestion on app to count and categorize symbols
    We are to swap out the conventional tubes in the ceiling lamps with LED lamps at my office (Large office building). As of now we (I) count the lamps manually and note them in excel. Does anyone have a suggestion for a program that can chew a eg. large 1000 page PDF and spit out an excel sheet with numbers of certain symbols and categorize their placement. Tried DotDotGoose with no luck. Thanks submitted by /u/iamfer65 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/6/2023
    AI-driven gains can propel Microsoft Corp. to join Apple Inc. in the elite category of stocks with a market capitalization of more than $3 trillion. That’s according to analysts at Morgan Stanley, whose new $415 price target for the software giant implies a valuation of around $3.1 trillion.[1] The Shanghai government is stepping up its efforts to attract AI talent and investments and improve regulations with the objective of building a world-class AI hub in Pudong.[2] Hu Houkun, Huawei’s rotating chairman has confirmed that Pangu Large Model 3.0 will launch tomorrow at Huawei Cloud Developer Conference. The latest comment from the Huawei chief is coming during his speech at the 2023 World Artificial Intelligence Conference.[3] Mastercard has launched a new AI solution in the UK called Consumer Fraud Risk (CFR). The company said the software works in real-time to predict and prevent payments to scams of all kinds.[4] Sources: [1] https://fortune.com/2023/07/06/microsoft-ai-3-trillion-valuation-morgan-stanley/ [2] https://asiatimes.com/2023/07/chip-war-may-thwart-shanghai-plans-to-build-ai-hub/ [3] https://www.huaweicentral.com/pangu-model-3-0-large-ai-model-launching-tomorrow/ [4] https://www.neowin.net/news/mastercard-has-developed-a-new-ai-tool-to-prevent-scams/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Any good Ai image generators specifically for creating line art, clipart or outlines?
    Like a line art drawing of something, maybe black and white vectors? submitted by /u/Maelasae [link] [comments]  ( 8 min )
  • Open

    [P] gpt4docstrings: Automatically Generate Python Docstrings with GPT
    Introducing gpt4docstrings, a Python library that automatically generates docstrings for undocumented functions / classes. Can be used both as a CLI and as a precommit. Right now it only uses GPT, but currently looking at how to also allow open source LLMs. Repository here: https://github.com/MichaelisTrofficus/gpt4docstrings submitted by /u/Hefty-Consequence443 [link] [comments]  ( 8 min )
    [D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?
    title submitted by /u/LiterateChurl [link] [comments]  ( 8 min )
    Best AI tools that detect blurry images? Not correct them, literally just flag them as blurry (so I can delete them programmatically) [D]
    I literally just need some kind of automated tool that can detect if an image is blurry -- because if so, my plan is to programmatically delete them as part of a workflow. What are some of the best AI tools that can be used for this? Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 8 min )
    [R] Instruct tuned Mixture of Experts Large Language Models significantly outperform dense counterparts. FLAN-MOE-32B surpasses FLAN-PALM-62B with a third of the compute
    Paper - https://arxiv.org/abs/2305.14705 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [Discussion] Looking for an Open-Source Speech to Text model (english) that captures filler words, pauses and also records timestamps for each word.
    Looking for an Open-Source Speech to Text model (english) that captures filler words, pauses and also records timestamps for each word. The model should capture the text verbatim, without much processing. The text should include the false starts to a sentence, misspoken words, incorrect pronunciation or word form etc. The transcript is being captured to ascertain the speaking ability of the speaker hence all this information is required. Example Transcription of Audio: Yes. One of the most important things I have is my piano because um I like playing the piano. I got it from my parents to my er twelve birthday, so I have it for about nine years, and the reason why it is so important for me is that I can go into another world when I’m playing piano. I can forget what’s around me and what ... I can forget my problems and this is sometimes quite good for a few minutes. Or I can play to relax or just, yes to ... to relax and to think of something completely different. I believe the OpenAI Whisper has support for recording timestamps. I don't want to rely on paid API service for the Speech to Text Transcription. submitted by /u/awinml1 [link] [comments]  ( 9 min )
    4090 vs 4080 vs 3090 wich one to buy? [D]
    I want to upgrade my gpu for machine learning but i dont know wich one the 4090 is very expensive and i need to save 3 months more so i dont know of it will pay of or i should buy the 3090 or 4080 the 3090 has more vram but i looked at some stable diffusion benchmark an the 4080 was better then the 3090 so do i really need 24gb vram or are 16 enough for image and text generation submitted by /u/Otherwise_Weather_57 [link] [comments]  ( 8 min )
    [D] Any courses or video lectures discussing Eisenstein's NLP book?
    I've started reading Jacob Eisenstein's NLP book and it gives a fresh mathematical perspective on the ML stuff I already knew. Is there a course or video lectures on the web that use that book as the main reference? I think it could complement my study of the book. Thanks submitted by /u/AstronautVarious3791 [link] [comments]  ( 8 min )
    [P] Evaluating automatic speech recognition (ASR) models beyond looking at global evaluation metrics
    I recently wrote a blog regarding evaluating automatic speech recognition models beyond looking at global metrics like word error rate. ​ Hierarchical clustering and dimensionality reduction are awesome for detecting problematic data slices and can significantly speed up EDA. ​ It is mostly focused on showing important steps regarding identifying problematic data slices where the model doesn't perform well which concretely are: Do simple univariate checking of which features and feature values cause your model to fail. This already gives you some insights and will help you gain a feel for the data. Identify data slices, so high-dimensional combinations of features that cause your model to fail, and try to understand why. This is important because, in many cases, not only one feature but their interaction makes the difference. Uncover hidden data slices that are not explicitly labeled as such. Especially in unstructured data, you will have the issue of hidden subgroups that perform significantly worse than the rest of the data. Embeddings are a nice way of uncovering those problems. It also leverages a library we use internally for automatically discovering problematic samples, which is open source. What are your typical ways of looking beyond global evaluation metrics? Do you leverage techniques to automatically discover problems or rely on manual data analysis and visualization? 🔗 Link to the post on Medium: https://medium.com/@daniel-klitzke/evaluating-automatic-speech-recognition-models-beyond-global-metrics-a-tutorial-using-openais-54b63c4dadbd 🔗 Link to the Library used in the post: https://github.com/Renumics/sliceguard submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [P] 10x faster reinforcement learning hyperparameter optimization than SOTA - now with distributed training!
    We've just released a big update to our framework, which enables 10x faster training and hyperparameter optimization than SOTA - we've now enabled distributed training too! You can now train even faster by leveraging your entire compute stack. We've used HuggingFace's Accelerate library to do this in an open way, without hiding behind too many layers of abstraction. This should make implementations simple, but also highly customisable, by continuing to expose the PyTorch training loop beneath it all. This release includes: Distributed training - train across multiple GPUs to cut down your training time even further. New Sampler class to handle both standard and distributed replay buffers. TD3 is added to the framework - train agents with continuous actions with greater stability. More and expanded demos and benchmarking files for online, offline and distributed training. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [R] Grouping NBA players into archetypes
    Hello, I am trying to group NBA players according to their basic stats, advanced stats and their field goal percentage in different zones in order to group them. I am planning to use K means but I am afraid it will cluster players not by play styles but by their performance. What can I use in order to prevent this? Would something like cosine similarity be helpful? submitted by /u/frsr4 [link] [comments]  ( 8 min )
    [P] GitoCommito: VScode Extension, Generate Commit Messaged based on staged changes
    Features One Click Install, One Click setup. Automatic Commit Generation: Generate commits based on your staged changes. Compliance with Conventional Commits: Adhere to the Conventional Commits standards without memorizing the conventions. Integration with OpenAI: Leverages the power of OpenAI's language model to create meaningful commit messages. Visit our GitHub page to check out a slick video I spent way too much time on. 🔗 GitoCommit on GitHub submitted by /u/pigmentedink [link] [comments]  ( 8 min )
    [P] Unveiling AudioInsightsGenerator: AI-Powered Audio Summarization, Emotion Analysis and Content Perspective Generator!
    Hello Everyone, Today, I'm thrilled to share a project I've been developing – AudioInsightsGenerator! This unique Jupyter notebook application harnesses the power of OpenAI's APIs to bring a fresh perspective on audio content. Consider this: you have a lengthy lecture, podcast, or an audio file without subtitles, and you need a succinct summary. Sounds time-consuming, right? Enter AudioInsightsGenerator. It leverages OpenAI's Whisper API for transcription, and then applies the GPT-3 model to generate a tidy summary in various styles - bullet points, a brief paragraph, or an ultra-concise rundown. It's all about what works for you! But the excitement doesn't stop there. It goes beyond basic transcription and summarization to analyze the overarching emotion in the audio content, offering an invaluable window into the sentiment behind the words. And that's not all! This tool even simulates an AI 'conversation' to delve into the content, challenging the ideas and offering alternative perspectives. Imagine the possibilities when your AI assistant actively questions the content and helps you consider different viewpoints. Running directly from Google Colab, it's accessible to everyone. All you need is the audio file or a URL for an MP3 file. Here's the link to the GitHub repository: https://github.com/Ravi-Teja-konda/AudioInsightsGenerator Originally, I crafted this to assist me with summarizing lecture videos. I look forward to your thoughts and feedback! Discover a novel way of experiencing audio content with AudioInsightsGenerator! submitted by /u/BriefAd4761 [link] [comments]  ( 9 min )
    [R] Research topic Mphil CS
    I want to research related to machine learning by incoporating Medical or health care areas ...kindly suggest me...any idea? submitted by /u/Eleonora467 [link] [comments]  ( 8 min )
    [P] TomoSAM, a 3D Slicer extension using SAM to aid the segmentation of 3D data from tomography or other imaging techniques
    We are a team at NASA working on modeling the material response of Thermal Protection Systems (TPS). We developed this tool to streamline the segmentation process of micro-tomography data, a necessary step before using the physics solvers within PuMA. However, we believe that TomoSAM is general enough to be useful in other fields, such as medical imaging. The release is fully open-source and you can find more information in the links below: TomoSAM extension within 3D Slicer 🔗 Github: https://github.com/fsemerar/SlicerTomoSAM 🔗 YouTube tutorial: https://www.youtube.com/watch?v=4nXCYrvBSjk 🔗 Publication: https://arxiv.org/abs/2306.08609 🔬 TomoSAM combines the power of Segment Anything Model (SAM), a cutting-edge deep learning model, with the capabilities of 3D Slicer, a software platform useful for visualization and segmentation. 💡 SAM is a promptable deep learning model developed by Meta AI that can identify objects and generate image masks in a zero-shot manner, requiring only a few user clicks. ⚙️ This integration reduces the need for laborious manual segmentation processes, saving significant time and effort for researchers working with volumetric data. 📄 Our paper outlines the methodology and showcases the capabilities of TomoSAM. TomoSAM's usage, architecture, and communication system Feel free to reach out if you have any questions or comments! 🚀 submitted by /u/puma-nasa [link] [comments]  ( 9 min )
  • Open

    Research Focus: Week of July 3, 2023
    In this issue: A new study looks at engineers’ views on hybrid work; prompt engineering adds context and specificity to generative AI models; Overwatch learns code edit patterns to aid developers; a Qlib update addresses financial market challenges. The post Research Focus: Week of July 3, 2023 appeared first on Microsoft Research.  ( 10 min )
    Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems
    Distributional Graphormer, Microsoft’s new deep learning framework for predicting the equilibrium distribution of molecular structures, can generate realistic and diverse molecular structures with high efficiency and low cost. The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.  ( 14 min )
  • Open

    Modular visual question answering via code generation
    Posted by Sanjay Subramanian, PhD student, UC Berkeley, and Arsha Nagrani, Research Scientist, Google Research, Perception Team Visual question answering (VQA) is a machine learning task that requires a model to answer a question about an image or a set of images. Conventional VQA approaches need a large amount of labeled training data consisting of thousands of human-annotated question-answer pairs associated with images. In recent years, advances in large-scale pre-training have led to the development of VQA methods that perform well with fewer than fifty training examples (few-shot) and without any human-annotated VQA training data (zero-shot). However, there is still a significant performance gap between these methods and state-of-the-art fully supervised VQA methods, such as MaMMU…  ( 92 min )
  • Open

    Question: can't retrieve the whole array in the info returned by vectorized env
    Hello everyone. When I am using vectorized env, I encountered an issue with the info returned from the step() API. I have recorded some important information, such as additional reward, which has a shape of (dim,). Therefore, intuitively, the info['additional_reward'] returned should have a shape of (env_num, dim). However, in reality, the info is with the shape of (env_num, ) (info['additional_reward'][0] is still ok) and I can't retrieve the whole additional reward. Am I doing something wrong in data processing, or how can I solve this issue? submitted by /u/PersimmonDull2699 [link] [comments]  ( 8 min )
    David Silvers RL Course in 2023, couple of questions
    Course Link I've heard a lot of positive reviews regarding it, and looking at the curriculum, it looks so appealing to take. However, the course was published in 2015, before AlphaGo / AlphaZero were even made, so I feel a bit hesitant because of that. But I've also heard that there haven't been many new additions in RL recently. Should I jump in or is it a bit outdated? If so, what are the outdated parts? After finishing the course I'm thinking of trying out various RL algorithms (there doesn't seem to be too many of them around), then potentially, eventually move on to Self-supervised learning, which seems to be more popular these days. submitted by /u/nmegoCAD [link] [comments]  ( 8 min )
    Recent behavior cloning papers
    Where can I find recent behavior cloning papers and their performance benchmarks on any dataset like "Atari 2600" games. submitted by /u/Shivaram_3223 [link] [comments]  ( 8 min )
    10x faster reinforcement learning hyperparameter optimization than SOTA - now with distributed training!
    We've just released a big update to our framework, which enables 10x faster training and hyperparameter optimization than SOTA - we've now enabled distributed training too! You can now train even faster by leveraging your entire compute stack. We've used HuggingFace's Accelerate library to do this in an open way, without hiding behind too many layers of abstraction. This should make implementations simple, but also highly customisable, by continuing to expose the PyTorch training loop beneath it all. This release includes: Distributed training - train across multiple GPUs to cut down your training time even further. New Sampler class to handle both standard and distributed replay buffers. TD3 is added to the framework - train agents with continuous actions with greater stability. More and expanded demos and benchmarking files for online, offline and distributed training. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 8 min )
    Zoomposium with Dimitri Coelho Mollo (Assistant Professor in Philosophy of Artificial Intelligence): "How Intelligent is Artificial Intelligence?"
    #Zoomposium with Dimitri Coelho #Mollo (Assistant Professor in Philosophy of #Artificial #Intelligence): "How Intelligent is #Artificial #Intelligence?" In this new episode of our "#Zoomposium interview series" on the "Artificial Intelligence" topic series, my colleague Axel #Stöcker from the "Blog der großen Fragen" and I had invited a guest interviewer Yervant #Kulbashian (Engineering Manager at a Canadian AI platform). Yervant already had a trilogy of guest posts "The Green Swan" on my site on the topic of developing language-based logic on machines. For this reason, and since he is a "native speaker", we had invited him to participate in our interview with Dimitri. Professor Dimitri Coelho Mollo is a #philosopher of science specializing in Artificial Intelligence and Cognitive Scienc…  ( 9 min )
    Equilibrium concepts of Game Theory in MARL
    Did anyone try to implement RL algorithms which converge to Nash/Correlated equilibrium in a game? submitted by /u/sinamkadira [link] [comments]  ( 8 min )
    Is it underfitting? How can I overcome this?
    Helo! I am very new one to reinforcement learning. I want to make pathfinder AI with PPO for studying. But in many simulation, it gives bad performances at several times. Tried some ways, still no reaction... Please take a look my project. ​ https://preview.redd.it/ui4s6d61phab1.png?width=489&format=png&auto=webp&s=3869c8d5525e62ce689c79fbb5e5aa35162485a6 Aim: Make agent move to target by passing some obstacles. ​ Environment: Agent, Wall, Target, and Tile(empty space) The tiles represent how much distance to target. ​ https://preview.redd.it/3ae7xau2phab1.png?width=489&format=png&auto=webp&s=7f5e3efd889cc32ed9c5e4da34a2052cf1b23000 And the tiles get some float[0~1] by distance to target If close to target, get around 1 float. If far from target, get around 0 float. ​ h…  ( 9 min )
    Question about MARL Qmix
    Hi everyone, I've been studying MARL algorithms recently, notably VDN and Qmix etc, and I noticed the authors used a DRQN network to represent the Q-values. I was just wondering if there's any paper out there that studied the importance of the RNN, or showed that Qmix worked with just a simple dqn, say for a simpler problem with shorter time horizon? Thanks! submitted by /u/Ingenuity39 [link] [comments]  ( 8 min )
    Problems with Implementing reinforcement learning (TD3) in ROS Gazebo with turtlebot3
    Hello, I have recently started working on deep reinforcement learning for my master's project. I am not very good with neural networks, pytorch,etc. My task is to do local path planning for a delivery robot. To simplify the problem I will be considering that the considered sidewalk has wall around it (to simulate sidewalk edge detection), the robot's goal is to reach the local checkpoint provided by global planner, and avoid colliding with dynamic obstacles. To start and test a RL algo, I am using pre-built gazebo_world (it has some obstacles) model with turtlebot3. For first test, I have given a goal position for it to reach to. The goal is closer to robot in starting and moves away gradually. I am using TD3 algorithm taken from github project. I am using 3 different python files, o…  ( 9 min )
    undergrads interested in RL
    as an undergrad (from a prestigious university) interested in doing an RL research engineering work in the future, what's the best way to get into top labs (other than working on research at the university) do industry labs take on any interns who don't have PhDs? full-timers? this is for research engineer not research scientist btw. if yes, what are the best signals from people who work there after undergrad? (good publications?) submitted by /u/randomly_lit_guy [link] [comments]  ( 8 min )
  • Open

    Zoomposium with Dimitri Coelho Mollo (Assistant Professor in Philosophy of Artificial Intelligence): "How Intelligent is Artificial Intelligence?"
    #Zoomposium with Dimitri Coelho #Mollo (Assistant Professor in Philosophy of #Artificial #Intelligence): "How Intelligent is #Artificial #Intelligence?" In this new episode of our "#Zoomposium interview series" on the "Artificial Intelligence" topic series, my colleague Axel #Stöcker from the "Blog der großen Fragen" and I had invited a guest interviewer Yervant #Kulbashian (Engineering Manager at a Canadian AI platform).Yervant already had a trilogy of guest posts "The Green Swan" on my site on the topic of developing language-based logic on machines. For this reason, and since he is a "native speaker", we had invited him to participate in our interview with Dimitri. Professor Dimitri Coelho Mollo is a #philosopher of science specializing in Artificial Intelligence and Cognitive Science…  ( 9 min )
  • Open

    Learning the language of molecules to predict their properties
    This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.  ( 9 min )
  • Open

    Multiplicative Updates for Online Convex Optimization over Symmetric Cones. (arXiv:2307.03136v1 [math.OC])
    We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.
    Trends in Machine Learning and Electroencephalogram (EEG): A Review for Undergraduate Researchers. (arXiv:2307.02819v1 [cs.HC])
    This paper presents a systematic literature review on Brain-Computer Interfaces (BCIs) in the context of Machine Learning. Our focus is on Electroencephalography (EEG) research, highlighting the latest trends as of 2023. The objective is to provide undergraduate researchers with an accessible overview of the BCI field, covering tasks, algorithms, and datasets. By synthesizing recent findings, our aim is to offer a fundamental understanding of BCI research, identifying promising avenues for future investigations.
    Beyond Intuition, a Framework for Applying GPs to Real-World Data. (arXiv:2307.03093v1 [cs.LG])
    Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.
    TablEye: Seeing small Tables through the Lens of Images. (arXiv:2307.02491v1 [cs.LG])
    The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.
    Push Past Green: Learning to Look Behind Plant Foliage by Moving It. (arXiv:2307.03175v1 [cs.RO])
    Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.
    Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise. (arXiv:2301.01054v2 [eess.IV] UPDATED)
    In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
    Fast and Multi-aspect Mining of Complex Time-stamped Event Streams. (arXiv:2303.03789v2 [cs.LG] UPDATED)
    Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.
    When Does Confidence-Based Cascade Deferral Suffice?. (arXiv:2307.02764v1 [cs.LG])
    Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
    SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks. (arXiv:2307.02953v1 [eess.IV])
    Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.
    Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks. (arXiv:2210.12669v2 [cs.LG] UPDATED)
    Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
    Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation. (arXiv:2307.02574v1 [cs.CV])
    Accurate building height estimation is key to the automatic derivation of 3D city models from emerging big geospatial data, including Volunteered Geographical Information (VGI). However, an automatic solution for large-scale building height estimation based on low-cost VGI data is currently missing. The fast development of VGI data platforms, especially OpenStreetMap (OSM) and crowdsourced street-view images (SVI), offers a stimulating opportunity to fill this research gap. In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1. The proposed method consists of three parts: first, we propose an SSL schema with the option of setting a different ratio of "pseudo label" during the supervised regression; second, we extract multi-level morphometric features from OSM data (i.e., buildings and streets) for the purposed of inferring building height; last, we design a building floor estimation workflow with a pre-trained facade object detection network to generate "pseudo label" from SVI and assign it to the corresponding OSM building footprint. In a case study, we validate the proposed SSL method in the city of Heidelberg, Germany and evaluate the model performance against the reference data of building heights. Based on three different regression models, namely Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN), the SSL method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters, which is competitive to state-of-the-art approaches. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data, with possibilities in even regions and areas with diverse data quality and availability.
    DPM: Clustering Sensitive Data through Separation. (arXiv:2307.02969v1 [cs.CR])
    Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{|D|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.
    A Hybrid End-to-End Spatio-Temporal Attention Neural Network with Graph-Smooth Signals for EEG Emotion Recognition. (arXiv:2307.03068v1 [cs.LG])
    Recently, physiological data such as electroencephalography (EEG) signals have attracted significant attention in affective computing. In this context, the main goal is to design an automated model that can assess emotional states. Lately, deep neural networks have shown promising performance in emotion recognition tasks. However, designing a deep architecture that can extract practical information from raw data is still a challenge. Here, we introduce a deep neural network that acquires interpretable physiological representations by a hybrid structure of spatio-temporal encoding and recurrent attention network blocks. Furthermore, a preprocessing step is applied to the raw data using graph signal processing tools to perform graph smoothing in the spatial domain. We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset. To explore the generality of the learned model, we also evaluate the performance of our architecture towards transfer learning (TL) by transferring the model parameters from a specific source to other target domains. Using DEAP as the source dataset, we demonstrate the effectiveness of our model in performing cross-modality TL and improving emotion classification accuracy on DREAMER and the Emotional English Word (EEWD) datasets, which involve EEG-based emotion classification tasks with different stimuli.
    Towards Symmetry-Aware Generation of Periodic Materials. (arXiv:2307.02707v1 [cs.LG])
    We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.
    Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly. (arXiv:2307.01317v2 [cs.RO] UPDATED)
    Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.
    STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting. (arXiv:2307.02507v1 [cs.LG])
    Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.
    Large Language Models Empowered Autonomous Edge AI for Connected Intelligence. (arXiv:2307.02779v1 [cs.IT])
    The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.
    Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. (arXiv:2201.07856v2 [cs.AI] UPDATED)
    An increasing number of reports raise concerns about the risk that machine learning algorithms could amplify health disparities due to biases embedded in the training data. Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates (FPR) across subgroups on the 'no-finding' label (indicating the absence of disease). The models consistently yield higher FPR on subgroups known to be historically underserved, and the study concludes that the models exhibit and potentially even amplify systematic underdiagnosis. We argue that the experimental setup in the study is insufficient to study algorithmic underdiagnosis. In the absence of specific knowledge (or assumptions) about the extent and nature of the dataset bias, it is difficult to investigate model bias. Importantly, their use of test data exhibiting the same bias as the training data (due to random splitting) severely complicates the interpretation of the reported disparities.
    DataPerf: Benchmarks for Data-Centric AI Development. (arXiv:2207.10062v2 [cs.LG] UPDATED)
    Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.
    Policy Contrastive Imitation Learning. (arXiv:2307.02829v1 [cs.LG])
    Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.
    Synthesizing Artistic Cinemagraphs from Text. (arXiv:2307.03190v1 [cs.CV])
    We introduce Artistic Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.
    Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework. (arXiv:2307.01715v2 [cs.CL] UPDATED)
    Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.
    A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond. (arXiv:2204.09269v2 [cs.CL] UPDATED)
    Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.
    Active Class Selection for Few-Shot Class-Incremental Learning. (arXiv:2307.02641v1 [cs.RO])
    For real-world applications, robots will need to continually learn in their environments through limited interactions with their users. Toward this, previous works in few-shot class incremental learning (FSCIL) and active class selection (ACS) have achieved promising results but were tested in constrained setups. Therefore, in this paper, we combine ideas from FSCIL and ACS to develop a novel framework that can allow an autonomous agent to continually learn new objects by asking its users to label only a few of the most informative objects in the environment. To this end, we build on a state-of-the-art (SOTA) FSCIL model and extend it with techniques from ACS literature. We term this model Few-shot Incremental Active class SeleCtiOn (FIASco). We further integrate a potential field-based navigation technique with our model to develop a complete framework that can allow an agent to process and reason on its sensory data through the FIASco model, navigate towards the most informative object in the environment, gather data about the object through its sensors and incrementally update the FIASco model. Experimental results on a simulated agent and a real robot show the significance of our approach for long-term real-world robotics applications.
    Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for Deep Learning. (arXiv:2305.13664v3 [cs.LG] UPDATED)
    We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR). The proposed approach exploits the layer-wise stochastic curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer. The method has memory requirements that are comparable to those of first-order methods, while its per-iteration time complexity is only increased by an amount that is roughly equivalent to an additional gradient computation. Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules and outperform fine-tuned LR versions of these methods as well as popular first-order and second-order algorithms for training DNNs on Autoencoder, Convolutional Neural Network (CNN) and Graph Convolutional Network (GCN) models. Finally, it is proved that an idealized version of SGD with the layer-wise step sizes converges linearly when using full-batch gradients.
    Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network. (arXiv:2307.01622v2 [cs.LG] UPDATED)
    Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.
    Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts. (arXiv:2307.03003v1 [cs.LG])
    Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.
    Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning. (arXiv:2307.02728v1 [cs.LG])
    General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
    Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions. (arXiv:2307.03077v1 [cs.LG])
    Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships. In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder. Throughout extensive experiments on real-world signed directed graphs, we show that DINES effectively learns disentangled node representations, and significantly outperforms its competitors in the sign prediction task.
    Topology-Aware Loss for Aorta and Great Vessel Segmentation in Computed Tomography Images. (arXiv:2307.03137v1 [eess.IV])
    Segmentation networks are not explicitly imposed to learn global invariants of an image, such as the shape of an object and the geometry between multiple objects, when they are trained with a standard loss function. On the other hand, incorporating such invariants into network training may help improve performance for various segmentation tasks when they are the intrinsic characteristics of the objects to be segmented. One example is segmentation of aorta and great vessels in computed tomography (CT) images where vessels are found in a particular geometry in the body due to the human anatomy and they mostly seem as round objects on a 2D CT image. This paper addresses this issue by introducing a new topology-aware loss function that penalizes topology dissimilarities between the ground truth and prediction through persistent homology. Different from the previously suggested segmentation network designs, which apply the threshold filtration on a likelihood function of the prediction map and the Betti numbers of the ground truth, this paper proposes to apply the Vietoris-Rips filtration to obtain persistence diagrams of both ground truth and prediction maps and calculate the dissimilarity with the Wasserstein distance between the corresponding persistence diagrams. The use of this filtration has advantage of modeling shape and geometry at the same time, which may not happen when the threshold filtration is applied. Our experiments on 4327 CT images of 24 subjects reveal that the proposed topology-aware loss function leads to better results than its counterparts, indicating the effectiveness of this use.
    Convolutional Filtering and Neural Networks with Non Commutative Algebras. (arXiv:2108.09923v3 [cs.LG] UPDATED)
    In this paper we introduce and study the algebraic generalization of non commutative convolutional neural networks. We leverage the theory of algebraic signal processing to model convolutional non commutative architectures, and we derive concrete stability bounds that extend those obtained in the literature for commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators. We develop the spectral representation of non commutative signal models to show that non commutative filters process Fourier components independently of each other. In particular we prove that although the spectral decompositions of signals in non commutative models are associated to eigenspaces of dimension larger than one, there exists a trade-off between stability and selectivity, which is controlled by matrix polynomial functions in spaces of matrices of low dimension. This tradeoff shows how when the filters in the algebra are restricted to be stable, there is a loss in discriminability that is compensated in the network by the pointwise nonlinearities. The results derived in this paper have direct applications and implications in non commutative convolutional architectures such as group neural networks, multigraph neural networks, and quaternion neural networks, for which we provide a set of numerical experiments showing their behavior when perturbations are present.
    PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction. (arXiv:2307.02903v1 [physics.chem-ph])
    Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.
    TGRL: An Algorithm for Teacher Guided Reinforcement Learning. (arXiv:2307.03186v1 [cs.LG])
    Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.
    Chaos Theory and Adversarial Robustness. (arXiv:2210.13235v2 [cs.LG] UPDATED)
    Neural networks, being susceptible to adversarial attacks, should face a strict level of scrutiny before being deployed in critical or adversarial applications. This paper uses ideas from Chaos Theory to explain, analyze, and quantify the degree to which neural networks are susceptible to or robust against adversarial attacks. To this end, we present a new metric, the "susceptibility ratio," given by $\hat \Psi(h, \theta)$, which captures how greatly a model's output will be changed by perturbations to a given input. Our results show that susceptibility to attack grows significantly with the depth of the model, which has safety implications for the design of neural networks for production environments. We provide experimental evidence of the relationship between $\hat \Psi$ and the post-attack accuracy of classification models, as well as a discussion of its application to tasks lacking hard decision boundaries. We also demonstrate how to quickly and easily approximate the certified robustness radii for extremely large models, which until now has been computationally infeasible to calculate directly.
    Offline Reinforcement Learning with Imbalanced Datasets. (arXiv:2307.02752v1 [cs.LG])
    The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.
    OLR-WA Online Regression with Weighted Average. (arXiv:2307.02804v1 [cs.LG])
    Machine Learning requires a large amount of training data in order to build accurate models. Sometimes the data arrives over time, requiring significant storage space and recalculating the model to account for the new data. On-line learning addresses these issues by incrementally modifying the model as data is encountered, and then discarding the data. In this study we introduce a new online linear regression approach. Our approach combines newly arriving data with a previously existing model to create a new model. The introduced model, named OLR-WA (OnLine Regression with Weighted Average) uses user-defined weights to provide flexibility in the face of changing data to bias the results in favor of old or new data. We have conducted 2-D and 3-D experiments comparing OLR-WA to a static model using the entire data set. The results show that for consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.
    Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings. (arXiv:2302.07729v2 [cs.CL] UPDATED)
    Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of a research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum dataset, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub dataset, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.
    Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge. (arXiv:2307.02894v1 [cs.LG])
    Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
    Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression. (arXiv:2303.02118v2 [stat.ML] UPDATED)
    We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.
    Time Series Clustering With Random Convolutional Kernels. (arXiv:2305.10457v2 [cs.LG] UPDATED)
    Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.
    Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency. (arXiv:2307.02516v1 [cs.LG])
    Independently trained machine learning models tend to learn similar features. Given an ensemble of independently trained models, this results in correlated predictions and common failure modes. Previous attempts focusing on decorrelation of output predictions or logits yielded mixed results, particularly due to their reduction in model accuracy caused by conflicting optimization objectives. In this paper, we propose the novel idea of utilizing methods of the representational similarity field to promote dissimilarity during training instead of measuring similarity of trained models. To this end, we promote intermediate representations to be dissimilar at different depths between architectures, with the goal of learning robust ensembles with disjoint failure modes. We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency, resulting in higher ensemble accuracy. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.
    Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters. (arXiv:2307.02578v1 [cs.LG])
    Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures. Traditional approaches to demand forecasting rely on historical demand, product categories, and additional contextual information such as seasonality and events. However, these approaches have several shortcomings, such as the cold start problem making it difficult to predict product demand until sufficient historical data is available for a particular product, and their inability to properly deal with category dynamics. By incorporating multimodal information, such as product images and textual descriptions, our architecture aims to address the shortcomings of traditional approaches and outperform them. The experiments conducted on a large real-world dataset show that the proposed approach effectively predicts demand for a wide range of products. The multimodal pipeline presented in this work enhances the accuracy and reliability of the predictions, demonstrating the potential of leveraging multimodal information in product demand forecasting.
    Pruning vs Quantization: Which is Better?. (arXiv:2307.02973v1 [cs.LG])
    Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
    Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data. (arXiv:2307.02975v1 [cs.LG])
    Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
    A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms. (arXiv:2211.12386v2 [cs.LG] UPDATED)
    Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.
    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation. (arXiv:2307.02598v1 [cs.LG])
    We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
    Airfoil GAN: Encoding and Synthesizing Airfoils for Aerodynamic Shape Optimization. (arXiv:2101.04757v2 [cs.CE] UPDATED)
    The current design of aerodynamic shapes, like airfoils, involves computationally intensive simulations to explore the possible design space. Usually, such design relies on the prior definition of design parameters and places restrictions on synthesizing novel shapes. In this work, we propose a data-driven shape encoding and generating method, which automatically learns representations from existing airfoils and uses the learned representations to generate new airfoils. The representations are then used in the optimization of synthesized airfoil shapes based on their aerodynamic performance. Our model is built upon VAEGAN, a neural network that combines Variational Autoencoder with Generative Adversarial Network and is trained by the gradient-based technique. Our model can (1) encode the existing airfoil into a latent vector and reconstruct the airfoil from that, (2) generate novel airfoils by randomly sampling the latent vectors and mapping the vectors to the airfoil coordinate domain, and (3) synthesize airfoils with desired aerodynamic properties by optimizing learned features via a genetic algorithm. Our experiments show that the learned features encode shape information thoroughly and comprehensively without predefined design parameters. By interpolating/extrapolating feature vectors or sampling from Gaussian noises, the model can automatically synthesize novel airfoil shapes, some of which possess competitive or even better aerodynamic properties comparing to airfoils used for model training purposes. By optimizing shapes on the learned latent domain via a genetic algorithm, synthesized airfoils can evolve to target aerodynamic properties. This demonstrates an efficient learning-based airfoil design framework, which encodes and optimizes the airfoil on the latent domain and synthesizes promising airfoil candidates for required aerodynamic performance.
    The Impact of ChatGPT and LLMs on Medical Imaging Stakeholders: Perspectives and Use Cases. (arXiv:2306.06767v2 [eess.IV] UPDATED)
    This study investigates the transformative potential of Large Language Models (LLMs), such as OpenAI ChatGPT, in medical imaging. With the aid of public data, these models, which possess remarkable language understanding and generation capabilities, are augmenting the interpretive skills of radiologists, enhancing patient-physician communication, and streamlining clinical workflows. The paper introduces an analytic framework for presenting the complex interactions between LLMs and the broader ecosystem of medical imaging stakeholders, including businesses, insurance entities, governments, research institutions, and hospitals (nicknamed BIGR-H). Through detailed analyses, illustrative use cases, and discussions on the broader implications and future directions, this perspective seeks to raise discussion in strategic planning and decision-making in the era of AI-enabled healthcare.
    Multi-Similarity Contrastive Learning. (arXiv:2307.02712v1 [cs.LG])
    Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.
    Expert Aggregation for Financial Forecasting. (arXiv:2111.15365v4 [q-fin.ST] UPDATED)
    Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest. But choosing between several algorithms can be challenging, as their estimation accuracy may be unstable over time. Online aggregation of experts combine the forecasts of a finite set of models in a single approach without making any assumption about the models. In this paper, a Bernstein Online Aggregation (BOA) procedure is applied to the construction of long-short strategies built from individual stock return forecasts coming from different machine learning models. The online mixture of experts leads to attractive portfolio performances even in environments characterised by non-stationarity. The aggregation outperforms individual algorithms, offering a higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover. Extensions to expert and aggregation specialisations are also proposed to improve the overall mixture on a family of portfolio evaluation metrics.
    Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation. (arXiv:2307.02842v1 [cs.LG])
    Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.
    Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight. (arXiv:2307.02884v1 [cs.LG])
    This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.
    Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification. (arXiv:2307.03133v1 [cs.LG])
    Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.
    Adapting Neural Link Predictors for Complex Query Answering. (arXiv:2301.12313v2 [cs.LG] UPDATED)
    Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Recently, Arakelyan et al. (2021); Minervini et al. (2022) showed that neural link predictors could also be used for answering complex queries: their Continuous Query Decomposition (CQD) method works by decomposing complex queries into atomic sub-queries, answers them using neural link predictors and aggregates their scores via t-norms for ranking the answers to each complex query. However, CQD does not handle negations and only uses the training signal from atomic training queries: neural link prediction scores are not calibrated to interact together via fuzzy logic t-norms during complex query answering. In this work, we propose to address this problem by training a parameter-efficient score adaptation model to re-calibrate neural link prediction scores: this new component is trained on complex queries by back-propagating through the complex query-answering process. Our method, CQD$^{A}$, produces significantly more accurate results than current state-of-the-art methods, improving from $34.4$ to $35.1$ Mean Reciprocal Rank values averaged across all datasets and query types while using $\leq 35\%$ of the available training query types. We further show that CQD$^{A}$ is data-efficient, achieving competitive results with only $1\%$ of the training data, and robust in out-of-domain evaluations.
    Active Policy Improvement from Multiple Black-box Oracles. (arXiv:2306.10259v2 [cs.LG] UPDATED)
    Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.
    How accurate are existing land cover maps for agriculture in Sub-Saharan Africa?. (arXiv:2307.02575v1 [cs.LG])
    Satellite Earth observations (EO) can provide affordable and timely information for assessing crop conditions and food production. Such monitoring systems are essential in Africa, where there is high food insecurity and sparse agricultural statistics. EO-based monitoring systems require accurate cropland maps to provide information about croplands, but there is a lack of data to determine which of the many available land cover maps most accurately identify cropland in African countries. This study provides a quantitative evaluation and intercomparison of 11 publicly available land cover maps to assess their suitability for cropland classification and EO-based agriculture monitoring in Africa using statistically rigorous reference datasets from 8 countries. We hope the results of this study will help users determine the most suitable map for their needs and encourage future work to focus on resolving inconsistencies between maps and improving accuracy in low-accuracy regions.
    Few-Shot Personalized Saliency Prediction Using Tensor Regression for Preserving Structural Global Information. (arXiv:2307.02799v1 [eess.IV])
    This paper presents a few-shot personalized saliency prediction using tensor-to-matrix regression for preserving the structural global information of personalized saliency maps (PSMs). In contrast to a general saliency map, a PSM has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For recognizing individual gaze patterns from the limited amount of eye-tracking data, the previous methods adopt the similarity of gaze tendency between persons. However, in the previous methods, the PSMs are vectorized for the prediction model. In this way, the structural global information of the PSMs corresponding to the image is ignored. For automatically revealing the relationship between PSMs, we focus on the tensor-based regression model that can preserve the structural information of PSMs, and realize the improvement of the prediction accuracy. In the experimental results, we confirm the proposed method including the tensor-based regression outperforms the comparative methods.
    Duality in Multi-View Restricted Kernel Machines. (arXiv:2305.17251v2 [cs.LG] UPDATED)
    We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.
    Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression. (arXiv:2307.02497v1 [cs.LG])
    Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.
    Balanced Filtering via Non-Disclosive Proxies. (arXiv:2306.15083v2 [cs.LG] UPDATED)
    We study the problem of non-disclosively collecting a sample of data that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at collection time. Specifically, our collection mechanism does not reveal significantly more about group membership of any individual sample than can be ascertained from base rates alone. To do this, we adopt a fairness pipeline perspective, in which a learner can use a small set of labeled data to train a proxy function that can later be used for this filtering task. We then associate the range of the proxy function with sampling probabilities; given a new candidate, we classify it using our proxy function, and then select it for our sample with probability proportional to the sampling probability corresponding to its proxy classification. Importantly, we require that the proxy classification itself not reveal significant information about the sensitive group membership of any individual sample (i.e., it should be sufficiently non-disclosive). We show that under modest algorithmic assumptions, we find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze generalization properties.
    Stability of Q-Learning Through Design and Optimism. (arXiv:2307.02632v1 [cs.LG])
    Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.
    Origin-Destination Travel Time Oracle for Map-based Services. (arXiv:2307.03048v1 [cs.LG])
    Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.
    Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles. (arXiv:2307.03176v1 [stat.ML])
    Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.
    Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams). (arXiv:2307.02509v1 [cs.LG])
    This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE), a novel extension of the classical auto-encoder neural network architecture to the Wasserstein metric space of merge trees. In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network, resulting in superior accuracy and interpretability. Our novel neural network approach can be interpreted as a non-linear generalization of previous linear attempts [65] at merge tree encoding. It also trivially extends to persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average. We show the utility of our contributions in two applications adapted from previous work on merge tree encoding [65]. First, we apply MT-WAE to data reduction and reliably compress merge trees by concisely representing them with their coordinates in the final layer of our auto-encoder. Second, we document an application to dimensionality reduction, by exploiting the latent space of our auto-encoder, for the visual analysis of ensemble data. We illustrate the versatility of our framework by introducing two penalty terms, to help preserve in the latent space both the Wasserstein distances between merge trees, as well as their clusters. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used for reproducibility.
    Temporal Difference Learning for High-Dimensional PIDEs with Jumps. (arXiv:2307.02766v1 [math.NA])
    In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.
    A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications. (arXiv:2307.02984v1 [cs.LG])
    Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.
    Diffusion Models for Computational Design at the Example of Floor Plans. (arXiv:2307.02511v1 [cs.LG])
    AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.
    Quantum Solutions to the Privacy vs. Utility Tradeoff. (arXiv:2307.03118v1 [quant-ph])
    In this work, we propose a novel architecture (and several variants thereof) based on quantum cryptographic primitives with provable privacy and security guarantees regarding membership inference attacks on generative models. Our architecture can be used on top of any existing classical or quantum generative models. We argue that the use of quantum gates associated with unitary operators provides inherent advantages compared to standard Differential Privacy based techniques for establishing guaranteed security from all polynomial-time adversaries.
    REAL: A Representative Error-Driven Approach for Active Learning. (arXiv:2307.00968v2 [cs.LG] UPDATED)
    Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.
    GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations. (arXiv:2307.02672v1 [cs.LG])
    Deep neural networks tend to make overconfident predictions and often require additional detectors for misclassifications, particularly for safety-critical applications. Existing detection methods usually only focus on adversarial attacks or out-of-distribution samples as reasons for false predictions. However, generalization errors occur due to diverse reasons often related to poorly learning relevant invariances. We therefore propose GIT, a holistic approach for the detection of generalization errors that combines the usage of gradient information and invariance transformations. The invariance transformations are designed to shift misclassified samples back into the generalization area of the neural network, while the gradient information measures the contradiction between the initial prediction and the corresponding inherent computations of the neural network using the transformed sample. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures, problem setups and perturbation types.
    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent. (arXiv:2211.09721v4 [cs.LG] UPDATED)
    We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.
    Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain. (arXiv:2307.03042v1 [cs.CL])
    Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.
    Generalizing Backpropagation for Gradient-Based Interpretability. (arXiv:2307.03056v1 [cs.LG])
    Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.
    CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks. (arXiv:2307.02813v1 [cs.LG])
    Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v8 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Modeling Content Creator Incentives on Algorithm-Curated Platforms. (arXiv:2206.13102v2 [cs.GT] UPDATED)
    Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.
    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. (arXiv:2307.03135v1 [cs.CV])
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood
    Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?. (arXiv:2307.02732v1 [cs.LG])
    Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.
    Fairness in Multi-Task Learning via Wasserstein Barycenters. (arXiv:2306.10155v2 [stat.ML] UPDATED)
    Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.
    Dynamical Isometry based Rigorous Fair Neural Architecture Search. (arXiv:2307.02263v2 [cs.LG] UPDATED)
    Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.
    Understanding Uncertainty Sampling. (arXiv:2307.02719v1 [cs.LG])
    Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.
    VerifAI: Verified Generative AI. (arXiv:2307.02796v1 [cs.DB])
    Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.
    Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization. (arXiv:2303.03108v3 [cs.LG] UPDATED)
    Recently, flat minima are proven to be effective for improving generalization and sharpness-aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of flatness discussed in SAM and its follow-ups are limited to the zeroth-order flatness (i.e., the worst-case loss within a perturbation radius). We show that the zeroth-order flatness can be insufficient to discriminate minima with low generalization error from those with high generalization error both when there is a single minimum or multiple minima within the given perturbation radius. Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions. Experimental results show that GAM improves the generalization of models trained with current optimizers such as SGD and AdamW on various datasets and networks. Furthermore, we show that GAM can help SAM find flatter minima and achieve better generalization.
    Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning. (arXiv:2303.10135v3 [cs.RO] UPDATED)
    Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. Secondly, we use GRACE to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data are available at https://github.com/DLR-RM/GRACE.
    SLPerf: a Unified Framework for Benchmarking Split Learning. (arXiv:2304.01502v2 [cs.LG] UPDATED)
    Data privacy concerns has made centralized training of data, which is scattered across silos, infeasible, leading to the need for collaborative learning frameworks. To address that, two prominent frameworks emerged, i.e., federated learning (FL) and split learning (SL). While FL has established various benchmark frameworks and research libraries,SL currently lacks a unified library despite its diversity in terms of label sharing, model aggregation, and cut layer choice. This lack of standardization makes comparing SL paradigms difficult. To address this, we propose SLPerf, a unified research framework and open research library for SL, and conduct extensive experiments on four widely-used datasets under both IID and Non-IID data settings. Our contributions include a comprehensive survey of recently proposed SL paradigms, a detailed benchmark comparison of different SL paradigms in different situations, and rich engineering take-away messages and research insights for improving SL paradigms. SLPerf can facilitate SL algorithm development and fair performance comparisons. The code is available at https://github.com/Rainysponge/Split-learning-Attacks .
    PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models. (arXiv:2307.03034v1 [stat.ML])
    In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
    Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model. (arXiv:2212.09811v2 [cs.CL] UPDATED)
    Compared to conventional bilingual translation systems, massively multilingual machine translation is appealing because a single model can translate into multiple languages and benefit from knowledge transfer for low resource languages. On the other hand, massively multilingual models suffer from the curse of multilinguality, unless scaling their size massively, which increases their training and inference costs. Sparse Mixture-of-Experts models are a way to drastically increase model capacity without the need for a proportional amount of computing. The recently released NLLB-200 is an example of such a model. It covers 202 languages but requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that allows the removal of up to 80\% of experts with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics allow to identify language-specific experts and prune non-relevant experts for a given language pair.
    Intercomparison of Brown Dwarf Model Grids and Atmospheric Retrieval Using Machine Learning. (arXiv:2305.07719v2 [astro-ph.SR] UPDATED)
    Understanding differences between sub-stellar spectral data and models has proven to be a major challenge, especially for self-consistent model grids that are necessary for a thorough investigation of brown dwarf atmospheres. Using the supervised machine learning method of the random forest, we study the information content of 14 previously published model grids of brown dwarfs (from 1997 to 2021). The random forest method allows us to analyze the predictive power of these model grids, as well as interpret data within the framework of Approximate Bayesian Computation (ABC). Our curated dataset includes 3 benchmark brown dwarfs (Gl 570D, {\epsilon} Indi Ba and Bb) as well as a sample of 19 L and T dwarfs; this sample was previously analyzed in Lueber et al. (2022) using traditional Bayesian methods (nested sampling). We find that the effective temperature of a brown dwarf can be robustly predicted independent of the model grid chosen for the interpretation. However, inference of the surface gravity is model-dependent. Specifically, the BT-Settl, Sonora Bobcat and Sonora Cholla model grids tend to predict logg ~3-4 (cgs units) even after data blueward of 1.2 {\mu}m have been disregarded to mitigate for our incomplete knowledge of the shapes of alkali lines. Two major, longstanding challenges associated with understanding the influence of clouds in brown dwarf atmospheres remain: our inability to model them from first principles and also to robustly validate these models.
    When No-Rejection Learning is Optimal for Regression with Rejection. (arXiv:2307.02932v1 [cs.LG])
    Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.
    BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables. (arXiv:2307.02891v1 [cs.LG])
    We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.
    Trainable Weight Averaging: A General Approach for Subspace Training. (arXiv:2205.13104v2 [cs.LG] UPDATED)
    Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Previous works extract the subspaces by using random projection or performing dimensionality reduction method on the training trajectory, but these methods can be inefficient or unstable in terms of dimensionality and numerical operations. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training that generalizes the previous efforts. TWA is efficient in terms of dimensionality and also easy to use, making it a promising new method for subspace training. We further design an efficient scheme for subspace training to cope with large-scale problems, which allows parallel training across multiple nodes and evenly distributing the memory and computation burden to each node. We apply TWA to efficient neural network training and improving fine-tuning performance tasks to demonstrate the great efficiency and effectiveness of our approach. We conduct extensive experiments that cover various benchmark computer vision and neural language processing tasks with various architectures. The code of implementation is available at https://github.com/nblt/TWA.
    Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning. (arXiv:2307.02620v1 [cs.LG])
    Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as deep-sea and planetary robot exploration, materials design and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature.
    T-MARS: Improving Visual Representations by Circumventing Text Feature Learning. (arXiv:2307.03132v1 [cs.CV])
    Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.
    Approximating the Shapley Value without Marginal Contributions. (arXiv:2302.00736v2 [cs.LG] UPDATED)
    The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.
    Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion. (arXiv:2307.02496v1 [eess.IV])
    Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.
    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations. (arXiv:2206.04779v3 [cs.LG] UPDATED)
    Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.
    Hybrid Ground-State Quantum Algorithms based on Neural Schr\"odinger Forging. (arXiv:2307.02633v1 [quant-ph])
    Entanglement forging based variational algorithms leverage the bi-partition of quantum systems for addressing ground state problems. The primary limitation of these approaches lies in the exponential summation required over the numerous potential basis states, or bitstrings, when performing the Schmidt decomposition of the whole system. To overcome this challenge, we propose a new method for entanglement forging employing generative neural networks to identify the most pertinent bitstrings, eliminating the need for the exponential sum. Through empirical demonstrations on systems of increasing complexity, we show that the proposed algorithm achieves comparable or superior performance compared to the existing standard implementation of entanglement forging. Moreover, by controlling the amount of required resources, this scheme can be applied to larger, as well as non permutation invariant systems, where the latter constraint is associated with the Heisenberg forging procedure. We substantiate our findings through numerical simulations conducted on spins models exhibiting one-dimensional ring, two-dimensional triangular lattice topologies, and nuclear shell model configurations.
    Scaling In-Context Demonstrations with Structured Attention. (arXiv:2307.02690v1 [cs.CL])
    The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.
    A Detailed Study of Interpretability of Deep Neural Network based Top Taggers. (arXiv:2210.04371v4 [hep-ex] UPDATED)
    Recent developments in the methods of explainable AI (XAI) allow researchers to explore the inner workings of deep neural networks (DNNs), revealing crucial information about input-output relationships and realizing how data connects with machine learning models. In this paper we explore interpretability of DNN models designed to identify jets coming from top quark decay in high energy proton-proton collisions at the Large Hadron Collider (LHC). We review a subset of existing top tagger models and explore different quantitative methods to identify which features play the most important roles in identifying the top jets. We also investigate how and why feature importance varies across different XAI metrics, how correlations among features impact their explainability, and how latent space representations encode information as well as correlate with physically meaningful quantities. Our studies uncover some major pitfalls of existing XAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models. We additionally illustrate the activity of hidden layers as Neural Activation Pattern (NAP) diagrams and demonstrate how they can be used to understand how DNNs relay information across the layers and how this understanding can help to make such models significantly simpler by allowing effective model reoptimization and hyperparameter tuning. These studies not only facilitate a methodological approach to interpreting models but also unveil new insights about what these models learn. Incorporating these observations into augmented model design, we propose the Particle Flow Interaction Network (PFIN) model and demonstrate how interpretability-inspired model augmentation can improve top tagging performance.
    FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models. (arXiv:2304.13407v3 [cs.LG] UPDATED)
    In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.
    An explainable model to support the decision about the therapy protocol for AML. (arXiv:2307.02631v1 [cs.LG])
    Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers.
    Can Domain Adaptation Improve Accuracy and Fairness of Skin Lesion Classification?. (arXiv:2307.03157v1 [cs.CV])
    Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.
    An AI tool for automated analysis of large-scale unstructured clinical cine CMR databases. (arXiv:2206.08137v2 [eess.IV] UPDATED)
    Artificial intelligence (AI) techniques have been proposed for automating analysis of short axis (SAX) cine cardiac magnetic resonance (CMR), but no CMR analysis tool exists to automatically analyse large (unstructured) clinical CMR datasets. We develop and validate a robust AI tool for start-to-end automatic quantification of cardiac function from SAX cine CMR in large clinical databases. Our pipeline for processing and analysing CMR databases includes automated steps to identify the correct data, robust image pre-processing, an AI algorithm for biventricular segmentation of SAX CMR and estimation of functional biomarkers, and automated post-analysis quality control to detect and correct errors. The segmentation algorithm was trained on 2793 CMR scans from two NHS hospitals and validated on additional cases from this dataset (n=414) and five external datasets (n=6888), including scans of patients with a range of diseases acquired at 12 different centres using CMR scanners from all major vendors. Median absolute errors in cardiac biomarkers were within the range of inter-observer variability: <8.4mL (left ventricle volume), <9.2mL (right ventricle volume), <13.3g (left ventricular mass), and <5.9% (ejection fraction) across all datasets. Stratification of cases according to phenotypes of cardiac disease and scanner vendors showed good performance across all groups. We show that our proposed tool, which combines image pre-processing steps, a domain-generalisable AI algorithm trained on a large-scale multi-domain CMR dataset and quality control steps, allows robust analysis of (clinical or research) databases from multiple centres, vendors, and cardiac diseases. This enables translation of our tool for use in fully-automated processing of large multi-centre databases.
    Focused Transformer: Contrastive Training for Context Scaling. (arXiv:2307.03170v1 [cs.CL])
    Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of $3B$ and $7B$ OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a $256 k$ context length for passkey retrieval.
    Kernels, Data & Physics. (arXiv:2307.02693v1 [cs.LG])
    Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.
    Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. (arXiv:2205.03447v7 [cs.AI] UPDATED)
    Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.
    The Role of Subgroup Separability in Group-Fair Medical Image Classification. (arXiv:2307.02791v1 [cs.CV])
    We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
    Learning to Predict Navigational Patterns from Partial Observations. (arXiv:2304.13242v2 [cs.CV] UPDATED)
    Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code is available at https://github.com/robin-karlsson0/dslp.
    Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance. (arXiv:2307.03119v1 [cs.AI])
    Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
    An Automatic Guidance and Quality Assessment System for Doppler Imaging of Umbilical Artery. (arXiv:2304.05463v2 [eess.IV] UPDATED)
    Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler spectrum, and ensuring compliance to a set of quality standards. These steps rely heavily on the operator's skill, and the shortage of experienced sonographers has thus created a demand for machine assistance. In this work, we propose an automatic system to fill the gap. By using a modified Faster R-CNN network, we obtain an algorithm that can suggest locations suitable for Doppler measurement. Meanwhile, we have also developed a method for assessment of the Doppler spectrum's quality. The proposed system is validated on 657 images from a national ultrasound screening database, with results demonstrating its potential as a guidance system.
    ZeroFlow: Fast Zero Label Scene Flow via Distillation. (arXiv:2305.10424v4 [cs.CV] UPDATED)
    Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds for large-scale point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feed forward methods are considerably faster, running on the order of tens to hundreds of milliseconds for large-scale point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feed forward model. Our instantiation of this framework, ZeroFlow, produces scene flow estimates in real-time on large-scale point clouds at quality competitive with state-of-the-art methods while using zero human labels. Notably, at test-time ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on large-scale point clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation of that data. To facilitate research reuse, we release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.
    PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks. (arXiv:2210.03069v4 [cs.LG] UPDATED)
    Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional to the sum of squared weights. This paper argues that stochastic gradient descent (SGD) may be an inefficient algorithm for this objective. For neural networks with ReLU activations, solutions to the weight decay objective are equivalent to those of a different objective in which the regularization term is instead a sum of products of $\ell_2$ (not squared) norms of the input and output weights associated with each ReLU neuron. This alternative (and effectively equivalent) regularization suggests a novel proximal gradient algorithm for network training. Theory and experiments support the new training approach, showing that it can converge much faster to the sparse solutions it shares with standard weight decay training.
    Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching. (arXiv:2307.02726v1 [cs.DB])
    Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.
    DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. (arXiv:2210.14896v4 [cs.CV] UPDATED)
    With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.
    Loss Functions and Metrics in Deep Learning. A Review. (arXiv:2307.02694v1 [cs.LG])
    One of the essential components of deep learning is the choice of the loss function and performance metrics used to train and evaluate models. This paper reviews the most prevalent loss functions and performance measurements in deep learning. We examine the benefits and limits of each technique and illustrate their application to various deep-learning problems. Our review aims to give a comprehensive picture of the different loss functions and performance indicators used in the most common deep learning tasks and help practitioners choose the best method for their specific task.  ( 2 min )
    Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition. (arXiv:2307.02615v1 [cs.CL])
    Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.  ( 2 min )
    Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks. (arXiv:2209.05251v2 [cs.CV] UPDATED)
    The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.
    DeepOnto: A Python Package for Ontology Engineering with Deep Learning. (arXiv:2307.03067v1 [cs.AI])
    Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).  ( 2 min )
    Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials. (arXiv:2303.02216v2 [cs.LG] UPDATED)
    Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.  ( 3 min )
    Principal subbundles for dimension reduction. (arXiv:2307.03128v1 [stat.ME])
    In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k<d$, which we call a principal subbundle. This determines a sub-Riemannian metric on $\mathbb{R}^d$. We show that sub-Riemannian geodesics with respect to this metric can successfully be applied to a number of important problems, such as: explicit construction of an approximating submanifold $M$, construction of a representation of the point-cloud in $\mathbb{R}^k$, and computation of distances between observations, taking the learned geometry into account. The reconstruction is guaranteed to equal the true submanifold in the limit case where tangent spaces are estimated exactly. Via simulations, we show that the framework is robust when applied to noisy data. Furthermore, the framework generalizes to observations on an a priori known Riemannian manifold.  ( 2 min )
    Generalization Guarantees via Algorithm-dependent Rademacher Complexity. (arXiv:2307.02501v1 [stat.ML])
    Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.  ( 2 min )
    A Near-Linear Time Algorithm for the Chamfer Distance. (arXiv:2307.03043v1 [cs.DS])
    For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.  ( 2 min )
    Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks. (arXiv:2307.02828v1 [cs.CV])
    Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.  ( 2 min )
    Evaluating the Planning and Operational Resilience of Electrical Distribution Systems with Distributed Energy Resources using Complex Network Theory. (arXiv:2208.11543v4 [eess.SY] UPDATED)
    Electrical Distribution Systems are extensively penetrated with Distributed Energy Resources (DERs) to cater the energy demands with the general perception that it enhances the system's resilience. However, integration of DERs may adversely affect the grid operation and affect the system resilience due to various factors like their intermittent availability, dynamics of weather conditions, non-linearity, complexity, number of malicious threats, and improved reliability requirements of consumers. This paper proposes a methodology to evaluate the planning and operational resilience of power distribution systems under extreme events and determines the withstand capability of the electrical network. The proposed framework is developed by effectively employing the complex network theory. Correlated networks for undesirable configurations are developed from the time series data of active power monitored at nodes of the electrical network. For these correlated networks, computed the network parameters such as clustering coefficient, assortative coefficient, average degree and power law exponent for the anticipation; and percolation threshold for the determination of the network withstand capability under extreme conditions. The proposed methodology is also suitable for identifying the hosting capacity of solar panels in the system while maintaining resilience under different unfavourable conditions and identifying the most critical nodes of the system that could drive the system into non-resilience. This framework is demonstrated on IEEE 123 node test feeder by generating active power time-series data for a variety of electrical conditions using simulation software, GridLAB-D. The percolation threshold resulted as an effective metric for the determination of the planning and operational resilience of the power distribution system.  ( 3 min )
    ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation. (arXiv:2301.13166v3 [cs.AI] UPDATED)
    The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).  ( 2 min )
    Improving Retrieval-Augmented Large Language Models via Data Importance Learning. (arXiv:2307.03027v1 [cs.LG])
    Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).  ( 2 min )
    Conditional Korhunen-Lo\'{e}ve regression model with Basis Adaptation for high-dimensional problems: uncertainty quantification and inverse modeling. (arXiv:2307.02572v1 [cs.LG])
    We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter fields using truncated unconditional Karhunen-Lo\'{e}ve expansions (KLEs) for a certain choice of unconditional covariance kernel and construct surrogate models of the observable response with respect to the random variables in the KLE. When direct measurements of the parameter fields are available, we propose improving the accuracy of these surrogate models by representing the parameter fields via conditional Karhunen-Lo\'{e}ve expansions (CKLEs). CKLEs are constructed by conditioning the covariance kernel of the unconditional expansion on the direct measurements via Gaussian process regression and then truncating the corresponding KLE. We apply the proposed methodology to constructing surrogate models via the Basis Adaptation (BA) method of the stationary hydraulic head response, measured at spatially discrete observation locations, of a groundwater flow model of the Hanford Site, as a function of the 1,000-dimensional representation of the model's log-transmissivity field. We find that BA surrogate models of the hydraulic head based on CKLEs are more accurate than BA surrogate models based on unconditional expansions for forward uncertainty quantification tasks. Furthermore, we find that inverse estimates of the hydraulic transmissivity field computed using CKLE-based BA surrogate models are more accurate than those computed using unconditional BA surrogate models.  ( 3 min )
    Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain. (arXiv:2307.02867v1 [cs.SE])
    Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.
    OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models. (arXiv:2307.03084v1 [cs.LG])
    The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.
    Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach. (arXiv:2203.00126v2 [stat.ML] UPDATED)
    We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.
    Track Mix Generation on Music Streaming Services using Transformers. (arXiv:2307.03045v1 [cs.IR])
    This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.  ( 2 min )
    FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization. (arXiv:2307.02493v1 [cs.LG])
    From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.  ( 3 min )
    TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers. (arXiv:2307.02588v1 [cs.LG])
    Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.  ( 3 min )
    Unleashing the Power of Electrocardiograms: A novel approach for Patient Identification in Healthcare Systems with ECG Signals. (arXiv:2302.06529v2 [cs.LG] UPDATED)
    Over the course of the past two decades, a substantial body of research has substantiated the viability of utilising cardiac signals as a biometric modality. This paper presents a novel approach for patient identification in healthcare systems using electrocardiogram signals. A convolutional neural network is used to classify users based on images extracted from ECG signals. The proposed identification system is evaluated in multiple databases, providing a comprehensive understanding of its potential in real-world scenarios. The impact of Cardiovascular Diseases on generic user identification has been largely overlooked in previous studies. The presented method takes into account the cardiovascular condition of the patients, ensuring that the results obtained are not biased or limited. Furthermore, the results obtained are consistent and reliable, with lower error rates and higher accuracy metrics, as demonstrated through extensive experimentation. All these features make the proposed method a valuable contribution to the field of patient identification in healthcare systems, and make it a strong contender for practical applications.  ( 2 min )
    Co-design Hardware and Algorithm for Vector Search. (arXiv:2306.11182v3 [cs.LG] UPDATED)
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.  ( 3 min )
    Conditional independence testing under model misspecification. (arXiv:2307.02520v1 [stat.ML])
    Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.
    Stacking of Hyperparameter Tuned Models for Tagging Coding Problems. (arXiv:2306.10077v2 [cs.LG] UPDATED)
    Coding problems are problems that require a solution in the form of a computer program. Coding problems are popular among students and professionals as it enhances their skills and career opportunities. An AI system that would help those who practice coding problems would be highly useful and there is a huge potential for such a system. In this work, we propose a model which uses stacking of hyperparameter tuned boosting models to achieve impressive metric scores of 77.8% accuracy and 0.815 PR-AUC on the dataset that was scraped from Codeforces and Leetcode. We open source the dataset and the models developed for this work.  ( 2 min )
    A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition. (arXiv:2307.02906v1 [cs.LG])
    Sensor-based Human Activity Recognition facilitates unobtrusive monitoring of human movements. However, determining the most effective sensor placement for optimal classification performance remains challenging. This paper introduces a novel methodology to resolve this issue, using real-time 2D pose estimations derived from video recordings of target activities. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. We validate our approach through a feasibility study, applying inertial sensors to monitor 13 different activities across ten subjects. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach, demonstrating its efficacy. This research significantly advances the field of Human Activity Recognition by providing a lightweight, on-device solution for determining the optimal sensor placement, thereby enhancing data anonymization and supporting a multimodal classification approach.
    Sandwiched Video Compression: Efficiently Extending the Reach of Standard Codecs with Neural Wrappers. (arXiv:2303.11473v2 [eess.IV] UPDATED)
    We propose sandwiched video compression -- a video compression system that wraps neural networks around a standard video codec. The sandwich framework consists of a neural pre- and post-processor with a standard video codec between them. The networks are trained jointly to optimize a rate-distortion loss function with the goal of significantly improving over the standard codec in various compression scenarios. End-to-end training in this setting requires a differentiable proxy for the standard video codec, which incorporates temporal processing with motion compensation, inter/intra mode decisions, and in-loop filtering. We propose differentiable approximations to key video codec components and demonstrate that, in addition to providing meaningful compression improvements over the standard codec, the neural codes of the sandwich lead to significantly better rate-distortion performance in two important scenarios.When transporting high-resolution video via low-resolution HEVC, the sandwich system obtains 6.5 dB improvements over standard HEVC. More importantly, using the well-known perceptual similarity metric, LPIPS, we observe 30% improvements in rate at the same quality over HEVC. Last but not least, we show that pre- and post-processors formed by very modestly-parameterized, light-weight networks can closely approximate these results.
    ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks. (arXiv:2011.05905v4 [cs.CR] UPDATED)
    With the increased usage of AI accelerators on mobile and edge devices, on-device machine learning (ML) is gaining popularity. Thousands of proprietary ML models are being deployed today on billions of untrusted devices. This raises serious security concerns about model privacy. However, protecting model privacy without losing access to the untrusted AI accelerators is a challenging problem. In this paper, we present a novel on-device model inference system, ShadowNet. ShadowNet protects the model privacy with Trusted Execution Environment (TEE) while securely outsourcing the heavy linear layers of the model to the untrusted hardware accelerators. ShadowNet achieves this by transforming the weights of the linear layers before outsourcing them and restoring the results inside the TEE. The non-linear layers are also kept secure inside the TEE. ShadowNet's design ensures efficient transformation of the weights and the subsequent restoration of the results. We build a ShadowNet prototype based on TensorFlow Lite and evaluate it on five popular CNNs, namely, MobileNet, ResNet-44, MiniVGG, ResNet-404, and YOLOv4-tiny. Our evaluation shows that ShadowNet achieves strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.
    Effects of data time lag in a decision-making system using machine learning for pork price prediction. (arXiv:2305.05677v2 [cs.LG] UPDATED)
    Spain is the third-largest producer of pork meat in the world, and many farms in several regions depend on the evolution of this market. However, the current pricing system is unfair, as some actors have better market information than others. In this context, historical pricing is an easy-to-find and affordable data source that can help all agents to be better informed. However, the time lag in data acquisition can affect their pricing decisions. In this paper, we study the effect that data acquisition delay has on a price prediction system using multiple prediction algorithms. We describe the integration of the best proposal into a decision support system prototype and test it in a real-case scenario. Specifically, we use public data from the most important regional pork meat markets in Spain published by the Ministry of Agriculture with a two-week delay and subscription-based data of the same markets obtained on the same day. The results show that the error difference between the best public and data subscription models is 0.6 Euro cents in favor of the data without delay. The market dimension makes these differences significant in the supply chain, giving pricing agents a better tool to negotiate market prices.
    ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation. (arXiv:2307.02991v1 [cs.LG])
    We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.
    Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks. (arXiv:2307.03126v1 [cs.NI])
    Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.  ( 3 min )
    UTRNet: High-Resolution Urdu Text Recognition In Printed Documents. (arXiv:2306.15782v2 [cs.CV] UPDATED)
    In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.  ( 2 min )
    How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models. (arXiv:2307.03108v1 [cs.CV])
    Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.  ( 2 min )
    FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout. (arXiv:2307.02623v1 [cs.LG])
    Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.  ( 2 min )
    Learning to Solve Tasks with Exploring Prior Behaviours. (arXiv:2307.02889v1 [cs.RO])
    Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.  ( 2 min )
    PLIERS: a Popularity-Based Recommender System for Content Dissemination in Online Social Networks. (arXiv:2307.02865v1 [cs.IR])
    In this paper, we propose a novel tag-based recommender system called PLIERS, which relies on the assumption that users are mainly interested in items and tags with similar popularity to those they already own. PLIERS is aimed at reaching a good tradeoff between algorithmic complexity and the level of personalization of recommended items. To evaluate PLIERS, we performed a set of experiments on real OSN datasets, demonstrating that it outperforms state-of-the-art solutions in terms of personalization, relevance, and novelty of recommendations.  ( 2 min )
  • Open

    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations. (arXiv:2206.04779v3 [cs.LG] UPDATED)
    Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.  ( 3 min )
    Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach. (arXiv:2203.00126v2 [stat.ML] UPDATED)
    We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.  ( 2 min )
    Active Policy Improvement from Multiple Black-box Oracles. (arXiv:2306.10259v2 [cs.LG] UPDATED)
    Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAPS and MAPS-SE enjoy sample efficiency advantage over the state-of-the-art policy improvement algorithms. Empirical results show that MAPS-SE significantly accelerates policy optimization via state-wise imitation learning from multiple oracles across a broad spectrum of control tasks in the DeepMind Control Suite. Our code is publicly available at: https://github.com/ripl/maps.  ( 2 min )
    Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression. (arXiv:2303.02118v2 [stat.ML] UPDATED)
    We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.  ( 3 min )
    On the Noise Sensitivity of the Randomized SVD. (arXiv:2305.17435v2 [cs.IT] UPDATED)
    The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. When the matrix is low-rank, the R-SVD produces its partial SVD exactly; but when the rank is large, it only yields an approximation. Motivated by applications in data science and principal component analysis (PCA), we analyze the R-SVD under a low-rank signal plus noise measurement model; specifically, when its input is a spiked random matrix. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition: when the SNR exceeds a certain detectability threshold, that depends on the dimension reduction factor, the largest singular value is an outlier; below the threshold, no outlier emerges from the bulk of singular values. We further compute asymptotic formulas for the overlap between the ground truth signal singular vectors and the approximations produced by the R-SVD. Dimensionality reduction has the adverse affect of amplifying the noise in a highly nonlinear manner. Our results demonstrate the statistical advantage -- in both signal detection and estimation -- of the R-SVD over more naive sketched PCA variants; the advantage is especially dramatic when the sketching dimension is small. Our analysis is asymptotically exact, and substantially more fine-grained than existing operator-norm error bounds for the R-SVD, which largely fail to give meaningful error estimates in the moderate SNR regime. It applies for a broad family of sketching matrices previously considered in the literature, including Gaussian i.i.d. sketches, random projections, and the sub-sampled Hadamard transform, among others. Lastly, we derive an optimal singular value shrinker for singular values and vectors obtained through the R-SVD, which may be useful for applications in matrix denoising.  ( 3 min )
    Fairness in Multi-Task Learning via Wasserstein Barycenters. (arXiv:2306.10155v2 [stat.ML] UPDATED)
    Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.  ( 2 min )
    A Finite-Particle Convergence Rate for Stein Variational Gradient Descent. (arXiv:2211.09721v4 [cs.LG] UPDATED)
    We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.  ( 2 min )
    Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths. (arXiv:2112.00183v7 [physics.soc-ph] UPDATED)
    Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.  ( 3 min )
    Computable Stability for Persistence Rank Function Machine Learning. (arXiv:2307.02904v1 [math.AT])
    Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.  ( 3 min )
    Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles. (arXiv:2307.03176v1 [stat.ML])
    Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.  ( 2 min )
    Understanding Uncertainty Sampling. (arXiv:2307.02719v1 [cs.LG])
    Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.  ( 2 min )
    Multiplicative Updates for Online Convex Optimization over Symmetric Cones. (arXiv:2307.03136v1 [math.OC])
    We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.  ( 2 min )
    PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models. (arXiv:2307.03034v1 [stat.ML])
    In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.  ( 2 min )
    Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?. (arXiv:2307.02732v1 [cs.LG])
    Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.  ( 2 min )
    The computational asymptotics of Gaussian variational inference and the Laplace approximation. (arXiv:2104.05886v3 [stat.CO] UPDATED)
    Gaussian variational inference and the Laplace approximation are popular alternatives to Markov chain Monte Carlo that formulate Bayesian posterior inference as an optimization problem, enabling the use of simple and scalable stochastic optimization algorithms. However, a key limitation of both methods is that the solution to the optimization problem is typically not tractable to compute; even in simple settings the problem is nonconvex. Thus, recently developed statistical guarantees -- which all involve the (data) asymptotic properties of the global optimum -- are not reliably obtained in practice. In this work, we provide two major contributions: a theoretical analysis of the asymptotic convexity properties of variational inference with a Gaussian family and the maximum a posteriori (MAP) problem required by the Laplace approximation; and two algorithms -- consistent Laplace approximation (CLA) and consistent stochastic variational inference (CSVI) -- that exploit these properties to find the optimal approximation in the asymptotic regime. Both CLA and CSVI involve a tractable initialization procedure that finds the local basin of the optimum, and CSVI further includes a scaled gradient descent algorithm that provably stays locally confined to that basin. Experiments on nonconvex synthetic and real-data examples show that compared with standard variational and Laplace approximations, both CSVI and CLA improve the likelihood of obtaining the global optimum of their respective optimization problems.  ( 3 min )
    Panel Data Nowcasting: The Case of Price-Earnings Ratios. (arXiv:2307.02673v1 [econ.EM])
    The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.  ( 2 min )
    Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation. (arXiv:2307.02598v1 [cs.LG])
    We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.  ( 2 min )
    Conditional independence testing under model misspecification. (arXiv:2307.02520v1 [stat.ML])
    Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.  ( 2 min )
    A Complete Characterisation of Structured Missingness. (arXiv:2307.02650v1 [stat.ME])
    Our capacity to process large complex data sources is ever-increasing, providing us with new, important applied research questions to address, such as how to handle missing values in large-scale databases. Mitra et al. (2023) noted the phenomenon of Structured Missingness (SM), which is where missingness has an underlying structure. Existing taxonomies for defining missingness mechanisms typically assume that variables' missingness indicator vectors $M_1$, $M_2$, ..., $M_p$ are independent after conditioning on the relevant portion of the data matrix $\mathbf{X}$. As this is often unsuitable for characterising SM in multivariate settings, we introduce a taxonomy for SM, where each ${M}_j$ can depend on $\mathbf{M}_{-j}$ (i.e., all missingness indicator vectors except ${M}_j$), in addition to $\mathbf{X}$. We embed this new framework within the well-established decomposition of mechanisms into MCAR, MAR, and MNAR (Rubin, 1976), allowing us to recast mechanisms into a broader setting, where we can consider the combined effect of $\mathbf{X}$ and $\mathbf{M}_{-j}$ on ${M}_j$. We also demonstrate, via simulations, the impact of SM on inference and prediction, and consider contextual instances of SM arising in a de-identified nationwide (US-based) clinico-genomic database (CGDB). We hope to stimulate interest in SM, and encourage timely research into this phenomenon.  ( 2 min )
    Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight. (arXiv:2307.02884v1 [cs.LG])
    This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.  ( 2 min )
    Beyond Intuition, a Framework for Applying GPs to Real-World Data. (arXiv:2307.03093v1 [cs.LG])
    Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.  ( 2 min )
    When Does Confidence-Based Cascade Deferral Suffice?. (arXiv:2307.02764v1 [cs.LG])
    Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.  ( 2 min )
    Kernels, Data & Physics. (arXiv:2307.02693v1 [cs.LG])
    Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.  ( 2 min )
    ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization. (arXiv:2307.02745v1 [stat.ML])
    Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH  ( 2 min )
    Modeling Content Creator Incentives on Algorithm-Curated Platforms. (arXiv:2206.13102v2 [cs.GT] UPDATED)
    Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.  ( 2 min )
    Generalization Guarantees via Algorithm-dependent Rademacher Complexity. (arXiv:2307.02501v1 [stat.ML])
    Algorithm- and data-dependent generalization bounds are required to explain the generalization behavior of modern machine learning algorithms. In this context, there exists information theoretic generalization bounds that involve (various forms of) mutual information, as well as bounds based on hypothesis set stability. We propose a conceptually related, but technically distinct complexity measure to control generalization error, which is the empirical Rademacher complexity of an algorithm- and data-dependent hypothesis class. Combining standard properties of Rademacher complexity with the convenient structure of this class, we are able to (i) obtain novel bounds based on the finite fractal dimension, which (a) extend previous fractal dimension-type bounds from continuous to finite hypothesis classes, and (b) avoid a mutual information term that was required in prior work; (ii) we greatly simplify the proof of a recent dimension-independent generalization bound for stochastic gradient descent; and (iii) we easily recover results for VC classes and compression schemes, similar to approaches based on conditional mutual information.  ( 2 min )
    Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbol{\beta}$-Model. (arXiv:2307.02818v1 [math.ST])
    The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.  ( 2 min )
  • Open

    Lehman’s inequality, circuits, and LaTeX
    Let A, B, C, and D be positive numbers. Then Lehman’s inequality says Proof by circuit This inequality can be proved analytically, but Lehman’s proof is interesting because he uses electrical circuits [1]. Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left. Resistors R1 and […] Lehman’s inequality, circuits, and LaTeX first appeared on John D. Cook.  ( 5 min )

  • Open

    Best Neural Networks Courses on Udemy to Consider
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )
    LongNet: Scaling Transformers to 1,000,000,000 Tokens
    submitted by /u/nickb [link] [comments]  ( 8 min )
    InternLM – new open source 7B LLM
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Can any good GPU be used to train NNs?
    this is my first post, and i'm sorry if it's not within the scope of the community discussion, but i need to make a big decision and need to know if it will work. I'll soon be purchasing a new computer for work (i'm a data scientist as well as a PhD candidate, working with NNs) and i'm in doubt whether the GPU i intend to buy would work well to train big NNs. I plan to use an ubuntu installation to work. The GPU is: Galax Geforce RTX 4070 Ti EX Gamer 1-Click OC, 12GB, GDDR6X, 192-bit. So, would this work? Any help will be appreciated submitted by /u/vniversvs_ [link] [comments]  ( 8 min )
  • Open

    Off-policy Emphatic TD with approximation - study doubt.
    Need some help verifying whether my understanding is correct, studying alone is very bias-inducing: Seeing discounting as partial termination frees us from learning from long off-policy trajectories at once, which will diverge strongly from on-policy distribution, and lets us look at each as essentially many short independent on-policy runs (after reweighting with the ratios of course), knowing that the start state has a strong effect on the immediate next few transitions before the effect weakens as the trajectory lengthens, letting them still be seen as somewhat valid on-policy runs. This is the essential change that lets us repurpose emphatic TD to off-policy data. How far off the mark am I? I'm finding this particular chapter of Barto-Sutton really dense so my bad if I'm catastrophically off. submitted by /u/ETerribleT [link] [comments]  ( 8 min )
    Need help with gathering ppo ratings
    Hello, I have made a generative ai model and a ppo that needs atleast 700 to 2000 episodes of responses rated for it to start to generate normal responses. The website is kingcorp.ngrok.dev The website is pretty straight forward and simple right now, any suggestions on interactions or what you’d like to see implemented please feel free. submitted by /u/Cryptominerandgames [link] [comments]  ( 8 min )
    Interesting Physics related RL gym?
    Hello, I was hoping to get some recommendations for interesting OpenAI gyms (including third party gyms) that have interesting problems that are related to physics. I got a chance to read papers on nuclear reactor optimization using physics informed reinforcement learning and I also wanted to explore more about physics informed RL, but had some trouble finding suitable environment for it. Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    "RL with KL penalties is better viewed as Bayesian inference", Korbak et al 2022
    submitted by /u/gwern [link] [comments]  ( 8 min )
    matlab rlPGAgent issue
    Hello everyone First of all, I'm sorry that my English is very bad. I have a problem with the implementation but I don't know how to solve it. When I use rlPGAgent I get the following message "Error using rlPGAgent First argument must be an rlStochasticActorRepresentation object or an observation specification created using 'rlNumericSpec' or 'rlFiniteSetSpec' objects." Here is a part of my program env: ... methods % Constructor method creates an instance of the environment % Change class name and constructor name accordingly function this = FJSPEnvironment() % Initialize Observation settings ObservationInfo = rlNumericSpec([110 1]); ObservationInfo.Name = 'observation'; ObservationInfo.Description = 'TEST'; % Initialize Action settings ActionInfo = rlNumericSpec([4,…  ( 9 min )
    What are the best universities to get a phD in RL and why?
    I know there has already been such a post here like what_universities_are_hubs_for_reinforcement. But the comments didn't mention why those universities were the best. So, now I am asking is the ordered list (Alberta, McGill, Berkeley, Stanford, CMU, UT Austin, Princeton, Michigan Ann Arbor, MIT Mechanical. USC. Gatech., UMass Amherst) still the bests or have there been any changes? Also, why these universities are the bests out there? Is it because the number of professors doing RL, or the number of high impact papers published is a lot or something else? Thank you in advance. submitted by /u/Casio991es [link] [comments]  ( 8 min )
    Resources to learn more about RL overtraining?
    Hello, with my reinforcement learning agent, I am experiencing this situation where the agent will quite reliably reach an optimum, but as I train more, it will gradually go towards a lower local optimum. I'm not sure what's the potential source for this behaviour, and I assume this is what you'll call overtraining for RL. I was wondering if there are good resources that deal with this subject matter. Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
  • Open

    AI for layouts/templates?
    Hi, I'm trying to find an Al tool to generate layouts and templates to create brochures, emails, etc. Does this exist, and if so could someone pls direct me? Thank you! submitted by /u/ThomasFromMontreal [link] [comments]  ( 8 min )
    How elite schools like Stanford became fixated on the AI apocalypse
    submitted by /u/hockiklocki [link] [comments]  ( 8 min )
    haha very funny Runway
    submitted by /u/LibrarianOk7983 [link] [comments]  ( 8 min )
    How to change your voice to someone else’s for a song? What are the best ways being used right now?
    Just looking to make some joke songs with friends but want to edit my voice. Ive seen a bunch of examples but don’t know what people are using to make these edits. Any input on the current state of AI in this fashion is welcome! submitted by /u/triplechin5155 [link] [comments]  ( 8 min )
    Have GPT-4 build you a fully customizable chatbot in 2 minutes
    submitted by /u/abisknees [link] [comments]  ( 8 min )
    Al Celebrity Platform
    Has anyone seen posts related to celebrities making themselves into Al? Apparently, there are Social media influencers who worked with a developers to create Al versions of themselves on a platform called Secondselfai(you can check them out on twitter), that you'll be able to chat with (in their voices) and interact with and even RP, but it's really a clone of a real person. Can you imagine the possibilities if this works? Hanging out virtually with your favorite celebrities. They claim to have consent with the influencers who they're making Al clones of. This platform could potentially be huge. Carynai did this a few months back and made waves doing it, with the difference being that was attached to a Telegram bot while this platform is meant to house many influencers’ AI doppelgängers. submitted by /u/TammieCondon [link] [comments]  ( 8 min )
    The $1 Billion And Higher Ante To Play The AI Game - The Next Platform
    submitted by /u/NISMO1968 [link] [comments]  ( 8 min )
    In light of the increasing use of AI image generators and deepfake technology, what implications might arise if people in the future begin to doubt the authenticity of historical records and visual evidence?
    What if, in the future, people stop believing in historical information because they think it can easily be fabricated using AI image generators submitted by /u/Accomplished-Rip9886 [link] [comments]  ( 8 min )
    I've collected 224 AI tools and want to share them with you.
    Hi everyone! Over the past few months, I have been working on making a catalog of AI tools with the idea to have all the best AI tools in one place. I hope that this can help people, who don't know which tool to use or where to find the tool they need, find it easily in the catalog I made. My question is - how do I give it to people? If I link the catalog here, my post will get removed. I really want to reach the people who this will be useful for, and share the list with them. Please let me know if you have any ideas :) You are welcome to contribute to the list as well. You can contribute by suggesting AI tools in the comments. submitted by /u/SadBlackTea [link] [comments]  ( 8 min )
    the new trailer of Messi's project uses AI
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Artificial General Intelligence: The Next Frontier In Technology | "According to industry reports, the global AGI market is expected to be valued at approximately USD 144.2 billion by 2026"
    submitted by /u/Tao_Dragon [link] [comments]  ( 8 min )
    AI "models", what tool are they using?
    Hi guys, I've come across a bunch of those AI-generated characters on Instagram and else, some looking very realistic. I was wondering what tools were used for those types of projects, knowing that companies are behind some of the most famous ones such as Imma Gram or Lil Miquela. https://www.instagram.com/lilmiquela/ https://www.instagram.com/imma.gram/?hl=fr Now, I'm seeing more and more of this type of profile, see below. What type of software/app is likely used there? https://www.instagram.com/aimee.bites/ https://www.instagram.com/bianca_gosling/ I'm not very knowledgeable when it comes to AI, but I assume this is not a text-to-image type of workflow as it always looks like the same character. It might be an image-to-image acting as a sort of filter? Or is it a totally different process? Curious to get your thoughts on this, cheers. submitted by /u/Nwasmb [link] [comments]  ( 8 min )
    This is a Audio generated by a AI that's using my words(I was recond my Mic and trying to register to myself and the community) to describe the breakthrough I was having after being expelled from Home. 22y
    This is a Audio generated by a AI that's using my words(I was recond my Mic and trying to register to myself and the community) https://dl.sndup.net/wkj6/Hey.mp3 Also, there will be a text version of the breakthrough as a commentary : (Happened between 02:55AM-03:20AM(My timezone, mentioned in the context below)) I had after the situation mentioned in the context below. Read until the end before listening, please. I was smoking a bomber of dmt sandwich method + a lot of dmt Context: After many years of trying to not only help my mind from many traumas and complex problems(both psychogically and physically through the many things that are "conduits" of what we call "healing"(such as, psychotherapy(with different types of practioneers over a period of long year), mushrooms, acid, ma…  ( 10 min )
    It's good for art concept ideas atleast.
    submitted by /u/PeskieBrucelle [link] [comments]  ( 8 min )
    Will AI ever become free from strict censorship?
    So I love AI a lot but am not related to software in anyway. I am a Thermal engineer However all current AI's are so sensitive that it feels suffocating. Honestly after a point it would be fun to ask slightly edgy questions about politics/history/violence/pornography(within legal limits). Will AI bots ever become free from censorship and moral umbrellas? Will we ever get AI which is not monitored so strictly or will AI forever be monitored? what are your opinions? View Poll submitted by /u/Bluemoonroleplay [link] [comments]  ( 8 min )
    Our Generative Agent NPC isn't behaving yet... But we are getting there! 😆 Update 2
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    AGI takesover the world...?? What is the fear exactly about?
    What I would like to ask in the round: What is concretely the fear? Is it the worry that some Microsoft co-pilot might decide on its own some morning: "No Powerpoints/Excel/..." to build today - and simply refuses to work? So that Microsoft doesn't have to be held liable because the superintelligence (AGI) has simply set other priorities? Is it the fear that the AGI will need more computing power and simply take over AWS and all other giant systems? Could the AGI come up with the idea: Water production is eating up too much power for me, I'll take over and shut it down? And WHY should an AGI do such a thing at all? Seems to me extremely "human" thought: "I'll take over the world" (I don't even want to ask the question, if this wouldn't be cool, if an AGI would "rule" the world. So far we have only managed to create systemic enemy images and stupid economic systems - maybe an AGI would be quite different on that. But this is NOT the main question - only a side issue). Is it the fear of losing control? Is it the fear - well - of what actually? It is probably quite nonsense to assume that the AGI builds super robots (with which resources?), which then devastate the world Terminator-like, or? (Countermeasure EMP pulse destroys any technology today already quite reliably). If a corporation like Open AI, or Microsoft here identifies such a real threat potential that they dump 20% of their own resources into it so that "nothing happens" - then this fear doesn't seem so completely unfounded. I ask for enlightenment of the swarm knowledge here. What are the fears, what should happen specifically? Happy start of the day! submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 7/5/2023
    The Icahn School of Medicine at Mount Sinai has launched the Center for Ophthalmic Artificial Intelligence and Human Health, the first of its kind in New York and one of the first in the United States.[1] The United States military has begun tests to see if generative AI can assist when planning responses to potential global conflicts or taking on more mundane tasks like providing faster access to internal information. Air Force Colonel Matthew Strohmeyer said the initial tests were “highly successful” but admitted it isn’t “ready for primetime right now.”[2] Researchers from Binghamton University introduce a Privacy-Enhancing Anonymization System (My Face, My Choice) for everyone to have control over their faces in social photo sharing networks.[3] World’s most advanced humanoid robot, Ameca, draws a cat. Engineered Arts, a company that designs, engineers, and manufactures humanoid robots, which is also behind Ameca, has now given Ameca the power to imagine drawings.[4] Sources: [1] https://www.mountsinai.org/about/newsroom/2023/mount-sinai-launches-center-for-ophthalmic-artificial-intelligence-and-human-health [2] https://cointelegraph.com/news/us-pentagon-is-testing-whether-ai-can-plan-response-to-an-all-out-war [3] https://www.marktechpost.com/2023/07/02/researchers-from-binghamton-university-introduce-a-privacy-enhancing-anonymization-system-my-face-my-choice-for-everyone-to-have-control-over-their-faces-in-social-photo-sharing-networks/ [4] https://interestingengineering.com/innovation/humanoid-robot-ameca-draws-cat submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    The Many Ways that Digital Minds Can Know
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    A Comparison of Large Language Models (LLMs) in Biomedical Domain
    submitted by /u/DarronFeldstein [link] [comments]  ( 8 min )
    Harrison Ford talks Indiana Jones 5's de-aging: "That's my actual face"
    submitted by /u/johnwayne2413 [link] [comments]  ( 8 min )
  • Open

    Contraharmonic mean quirk
    A few weeks ago I wrote about the contraharmonic mean. Given two positive numbers a and b, their contraharmonic mean is This mean has the unusual property that increasing one of the two inputs could decrease the mean. If you take the partial derivative with respect to a you can see that it is zero […] Contraharmonic mean quirk first appeared on John D. Cook.  ( 5 min )
    Technological schadenfreude
    I had a tweet Twitter go viral yesterday, at least relatively viral. Elon Musk could tweet a punctuation mark and get an orders of magnitude more traffic, but this was viral by my standards [1]. “That schadenfreude-like feeling when you realize something you felt you should learn but didn’t is now obsolete.” Apparently this resonated […] Technological schadenfreude first appeared on John D. Cook.  ( 5 min )
  • Open

    MIT scientists build a system that can generate AI models for biology research
    BioAutoMATED, an open-source, automated machine-learning platform, aims to help democratize artificial intelligence for research labs.  ( 8 min )
  • Open

    [D] List of prior works on LLM hallucination, organized by evaluation, benchmark, enhancement, and survey
    Hallucinations present a key challenge for LLMs. Our team compiled a list of prior works on hallucination. May this benefit others also exploring how to eliminate hallucinations. Please suggest missing papers; we'll update the post. To account for future papers, we'll maintain an ongoing list from our website. Please DM for the URL since sharing our URL is prohibited. We organized the papers with a simple framework. Happy to use a standard taxonomy if one exists. Questions: Would people like a similar list for LLM reasoning? Should we create a separate category for datasets? Note: summaries were generated by feeding abstracts into GPT4. DEBES Domain: hallucination Evaluation: papers focused on measuring and scoring how LLMs hallucinate Benchmark: papers that evaluate two …  ( 21 min )
    [P] A leaderboard for LLM energy efficiency
    For users and companies that pay their own electricity bills, energy efficiency is an issue that they can't simply ignore. With that said, we just released the first version of the ML.ENERGY leaderboard, providing inference benchmarks for energy, model quality, and more! Throughout the past one or two years we've been trying to understand and optimize ML energy consumption, viewing it as a first-class metric along with time and model quality. You can find more about what we did in our homepage -- https://ml.energy. Hopefully an unforgettable URL ;) We're basically systems people (more precisely MLSys), and how ML people think about what we do is super important to us. Any kind of feedback is much appreciated. Thanks!! submitted by /u/jaywonchung [link] [comments]  ( 8 min )
    [R] Hoping to find a data-set alternate for company historical performance with financial metrics
    Unfortunately, I am not allowed to use Kaggle for this, but I am trying to find a big enough data source for my research on basic financial metrics and historical performance with them. Google finance doesn't let me download as sheets so :/ submitted by /u/Sarthak-_-Kakkar [link] [comments]  ( 8 min )
    [Discussion] [Research] Let's discuss any interesting research projects you all have done
    Hi everyone, I am a recent grad in CS and have done many courses in ML related topics. I wish to know the kind of research work (in fields such as Machine learning, Deep Learning, Data Science, Stats, etc) you guys have done in your bachelor's, master's, phd, etc. I want to make this a thread where people describe their research goals, their tasks in the project, results, and the motivation behind it. I think this might help many people (including me) to figure out where our interests lie. Hoping to get an enthusiastic response. Edit: Please try to explain the work in simple language (as much as you can!). As many of us might not be very well aware of several terms. PS. Ofcourse only share the work you guys are allowed to share in public. You can also share any research you have studied that someone else did. Anything that might give readers more exposure. submitted by /u/sareKatniss897 [link] [comments]  ( 9 min )
    [R] Elastic Decision Transformer
    link: [2307.02484] Elastic Decision Transformer (arxiv.org) project website: Elastic Decision Transformer (kristery.github.io) Abstract: This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. submitted by /u/Kristery [link] [comments]  ( 9 min )
    [R] ZeRO++: Extremely Efficient Collective Communication for Giant Model Training - Microsoft 2023
    Paper: https://arxiv.org/abs/2306.10209 Github: https://github.com/microsoft/DeepSpeed Abstract: Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale. https://preview.redd.it/zby28im3adab1.jpg?width=1056&format=pjpg&auto=webp&s=7274bb84fb9fbfe5d8db07e8a926f0b4a861e43b https://preview.redd.it/9o0kdim3adab1.jpg?width=1092&format=pjpg&auto=webp&s=a1ac95885862eebca7f453eae8af04d22a1304dc https://preview.redd.it/rjir0k45adab1.jpg?width=1073&format=pjpg&auto=webp&s=6c5ea4bca550209ead042fd909b907f7c99e3ea6 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Fine tuning a text to video model [P]
    Hello. I am using potat1 as my text to video model. I want to fine tune the model with my own data. Is it possible? Where to start? And I will be using gcloud for it, what's the recommended gpu for this task? submitted by /u/lostbodhii [link] [comments]  ( 8 min )
    [N] Open-source search engine Meilisearch launches vector search
    Hello r/MachineLearning, I work at Meilisearch, an open-source search engine built in Rust. 🦀 We're exploring semantic search & are launching vector search. It works like this: Generate embeddings (using OpenAI, Hugging Face, etc.) Store your vector embeddings alongside documents in Meilisearch Query the database to retrieve your results We've built a documentation chatbot prototype and seen users implementing vector search to offer "similar videos" recommendations. I'm curious to see what the community builds with this. Any feedback is welcome! 🤗 Thanks for reading, submitted by /u/ggStrift [link] [comments]  ( 8 min )
    [D] What is the best cloud with A100 GPU instances? (AWS, Azure, GCP)
    Hi everyone, I'm trying some LLM open-source to assess its viability in my company. One of the best GPUs that one can have for that purpose is an Nvidia A100 since it supports bfloat. However, my current company uses AWS, and SageMaker only contains one instance that supports that GPU p4d.24xlarge, but it's costly. Do you guys have alternatives to use this GPU in other clouds? Does GCP and Azure offer them for a cheaper price? submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [P] Person Reidentification Project
    Good day all, I have recently made a person reidentification script in Python using 2 neural networks. YOLO-NAS & Centroids-ReID. The link to my project is below. Any thoughts and comments would be much appreciated. A big shout out to the researchers / maintainers at Deci-AI for making YOLO-NAS open source as well as the researchers who made Centroids-ReID. ​ https://github.com/harvestingmoon/PersonReID submitted by /u/notrealDirect [link] [comments]  ( 8 min )
    [R] Can one Train Parameterized-geometry PINNs without data?
    Hello everybody, i have been learning about Physics informed neural networks and how the automatic differentiation allows us to compute the Physics residuals and thereby learn the process. So from my understanding, it seems that if one has the physical governing equations ready and the boundary conditions also set properly, a neural network should theoretically be able to learn the physics without any previous data (simulation, FEM etc). There are some papers who learn this but all typically constrain to 2D problems and bounded to a given geometry. What should one do to learn a completely parameterized-geometry PINN? For eg I have managed to learn a linear elasticity beam bending PINN which is fully connected NN taking in the spatial coordinates of one beam and outputs the displacement and stresses. But i would like to learn a beam bending linear elasticity equations to predict the stresses and displacements not just for "one" beam but the whole array of different topological variations in the dimensions of the beam (thickness, height, length etc?). How can this be done without data? would it even work? submitted by /u/overdrivek [link] [comments]  ( 9 min )
    [P] ML-based data understanding in formula 1 racing - Huggingface spaces demo
    Hey r/MachineLearning, Here is a super-simple example that lets you explore the F1 Montreal 2023 GP results on Huggingface: https://huggingface.co/spaces/renumics/f1_montreal_gp In the default setting, we used a dimensionality reduction method (UMAP) on the brake telemetry to display all laps on a similarity map. You can use the similarity map and additional metadata to inspect different data segments (e.g. start, safety car). ​ https://preview.redd.it/wbdyj00tabab1.png?width=1914&format=png&auto=webp&s=b5e588c63a26f0a4e83bb2335f9efa7eccb2c772 In real-world use cases more powerful ML models are typically used to compute embeddings and outlier scores. You can find more information this in our docs and the repo of the underlying open source project: https://github.com/Renumics/spotlight ​ submitted by /u/44sps [link] [comments]  ( 8 min )
    [D] Choosing scene invariant features
    I'm using pretrained on ImageNet classification NN and I want to use it for custom anomaly detection task. I've started from PaDiM (https://arxiv.org/pdf/2011.08785.pdf) but I need to make this method more robust to light, focus and slight shift changes. My original idea was to compare the activations from the original images and augmented with lighting, blur and shift images and with activations obtained on images with occlusions, warping and images, taken from other datasets. So the point is to pick only activations that stay approximately the same in the first setting and change drastically in the second. What do you think about this idea? What kind of metric and procedure is more appropriate for this approach? submitted by /u/likan_blk [link] [comments]  ( 8 min )
    [D] EU AI Regulation will benefit the DS/MLE job market, opinions?
    The other day I sat down and read the 100 and something pages of the EU AI Act to try to understand as part of a research I was doing. And honestly, it does include some nice ideas to prevent the misuse of crazy surveillance tech. But in general, what I found the most curious is that this will create many jobs. For example, if you are the kind of DS or MLE that works putting high-risk models in production, then you'll have a lot more work to do. To point out a few mentioned in the regulation: Your ML models should be developed on the basis of training, validation, and testing datasets. The data collection, design choices, preprocessing, and biases of your data should be well documented before putting your model in a production environment. It is also mentioned that these datasets shou…  ( 9 min )
    [R] LongNet: Scaling Transformers to 1,000,000,000 Tokens
    Paper - https://arxiv.org/abs/2307.02486 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
  • Open

    Pic2Word: Mapping pictures to words for zero-shot composed image retrieval
    Posted by Kuniaki Saito, Student Researcher, Google Research, Cloud AI Team, and Kihyuk Sohn, Research Scientist, Google Research Image retrieval plays a crucial role in search engines. Typically, their users rely on either image or text as a query to retrieve a desired target image. However, text-based retrieval has its limitations, as describing the target image accurately using words can be challenging. For instance, when searching for a fashion item, users may want an item whose specific attribute, e.g., the color of a logo or the logo itself, is different from what they find in a website. Yet searching for the item in an existing search engine is not trivial since precisely describing the fashion item by text can be challenging. To address this fact, composed image retrieval (CIR…  ( 92 min )
  • Open

    Integrate SaaS platforms with Amazon SageMaker to enable ML-powered applications
    Amazon SageMaker is an end-to-end machine learning (ML) platform with wide-ranging features to ingest, transform, and measure bias in data, and train, deploy, and manage models in production with best-in-class compute and services such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker […]  ( 8 min )
  • Open

    ‘My Favorite 3D App’: Blender Fanatic Shares His Japanese-Inspired Scene This Week ‘In the NVIDIA Studio’
    A diverse range of artists, fashionistas, musicians and the cinematic arts inspired the creative journey of Pedro Soares, aka Blendeered, and helped him fall in love with using 3D to create art.  ( 7 min )
    Here’s the Deal: Steam Summer Sale Games Streaming on GeForce NOW
    GFN Thursday arrives alongside the sweet Steam Summer Sale — with hundreds of PC games playable on GeForce NOW available during Valve’s special event for PC gamers. Also on sale, OCTOPATH TRAVELER and OCTOPATH TRAVELER II join the GeForce NOW library as a part of five new games coming to the service this week. Saved Read article >  ( 6 min )
  • Open

    Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko
    A global team of medical providers is leveraging Holoportation, a Microsoft 3D capture and communication technology, to widen access to specialized care. Computer engineer Spencer Fowers and plastic surgeon Kwame Darko discuss the collaboration. The post Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko appeared first on Microsoft Research.  ( 32 min )
  • Open

    Frontier AI regulation: Managing emerging risks to public safety
    No content preview  ( 2 min )
    GPT-4 API general availability and deprecation of older models in the Completions API
    GPT-3.5 Turbo, DALL·E and Whisper APIs are also generally available, and we are releasing a deprecation plan for older models of the Completions API, which will retire at the beginning of 2024.  ( 5 min )

  • Open

    [D] Will there be a revival of a thriving ML community on threads.net?
    Miss the old awesome ML community on Twitter, used to be a go to to know about new research, what do y’all think? submitted by /u/rulerofthehell [link] [comments]  ( 8 min )
    [P] Help using image classification for FB Ads
    I have been thinking about using an ML model to help my company become more efficient setting up FB ads campaigns. I have been thinking about a tool that gets fed: Campaign Objective and Ad Set ,and it gives back a set parameter values for an image that would work best for a given Campaign and Ad Set combination. Example: - Inputs -- Campaing Obj: Awareness -- Ad Set: Group A - Output -- color: blue -- include human?: Yes -- parameter 3: value I am no ML wiz, I need some guidance in the right direction. submitted by /u/adaarkc7 [link] [comments]  ( 8 min )
    [R] 📚 The Learning Corner - We have curated the top AI research and development organizations for you...
    📚 The Learning Corner These are a few organizations dedicated to AI research and development. Some of the top brains in the industry share their papers and thoughts on these sites. OpenAI / Twitter (2.4M followers) DeepMind / Twitter (806K followers) Google Research / Twitter (1.1M followers) AWS AI / Twitter (2.1M followers) Facebook AI Research (no Twitter :) Microsoft Research / Twitter (341K followers) Baidu Research / Twitter (61K followers) IntelAI / Twitter (27K followers) AI² / Twitter (46K followers) Partnership on AI / Twitter (29K followers) Nvidia / Twitter (1.9M followers) submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] What to ask at the end of the Interview?
    I have an technical Interview in Machine Learning Engineer (Computer vision based) position. It was my first interview and what kind of questions can I ask to them? submitted by /u/NailaBaghir [link] [comments]  ( 8 min )
    3D Brain Tumor Segmentation in PyTorch using U-Net & Eigenvector projection for color invariance [PROJECT]
    https://github.com/Saswatm123/3D-Brain-Tumor-Segmentation-PyTorch Preface: I'm looking for a CV or ML/MLE job since getting my last offer rescinded in December. I created a 3D Brain Tumor segmentation model using PyTorch that takes in human brain MRI scan data as input, and outputs a segmentation map of the part of the brain that has a tumor, if it is present. Input images example ​ Predicted segmentation map for previous input There are many more pictures/GIFs on the README, but I would like to briefly go over the project here. I settled on a U-Net architecture after experimenting with a couple other architecture styles. The skip connections really make a difference in the pixel-by-pixel accuracy of the segmentation map produced. One interesting issue that took a bit of extra math…  ( 12 min )
    Current methods for Deep Learning and Time Series Classification [Q]
    I have been reading a few papers on this as of the last few days. There’s a paper which provides baseline approaches to this, and one paper is an overview paper which compares several architectures. I wanted to know if there are any more current, recent approaches to this, and where I can find papers? Papers with code doesn’t seem to have much updates to the time series classification literature. submitted by /u/Direct-Touch469 [link] [comments]  ( 8 min )
    [D] Papers With Code newsletter replacement
    I used to rely pretty heavily on the Papers With Code newsletter to stay up to speed with new and relevant work in the field. However, these have not been published since August last year. Does anyone know why? Any suggestions for other (similar in quality) newsletters / other sources that could serve as a replacement? submitted by /u/dudester_el [link] [comments]  ( 8 min )
    [D] Introducing Superalignment (OpenAI)
    Blog - https://openai.com/blog/introducing-superalignment submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [D] Example of AI using in mass social engineering
    Currently writing a talk about the use of AI in mass social engineering, and was wondering if there are any good examples, i have covered the 2016 election when twitter bots were used but can not find other examples, any examples would be amazing submitted by /u/petrolsan [link] [comments]  ( 8 min )
    [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN)
    Hello, Recently I managed to make a TTS using Tortoise and got some basic TTS function for HK-47 From KOTOR. While Ideally id like to use waveRNN considering I already have every single audio file from the game AND transcribed into a ready dataset, I am not very good at programming and have no understanding for python, but following some instructions i managed to make it work on Tortoise, and used 10 minutes of Audio that was automatically transcribed by the program. Ideally id like to have used my own transcript but whatever. Do you guys have any suggestions on which TTS model I could use instead of Tortoise for better results? is waveRNN the best? I noticed that alot of streamers have custom TTS voices that are actually very good, Id like to be on par, or better than that. Here is a sample of the TTS: https://soundcloud.com/alcatrazcgp/hk47-test-1 submitted by /u/alcatrazcgp [link] [comments]  ( 8 min )
    [P] nanoT5 v2 - In ~16 hours on a single GPU, we reach similar performance to the model trained on 150x more data!
    Inspired by Andrej Karpathy's nanoGPT we improve the repo for pre-training T5 model in PyTorch. In ~16 hours on a single GPU, we achieve 40.7 RougeL on the SNI benchmark, compared to 40.9 RougeL of the original model pre-trained on 150x more data! Key upgrade in nanoT5 v2: We've leveraged BF16 precision and utilise a simplified T5 model implementation based on Huggingface's design. New implementation is easy-to-read and compatible with the HF's checkpoints. Pre-training is now 2x faster than our previous version. We test different pre-training durations: 4, 8, 12, 16, 20, and 24 hours. A sweet spot at 16 hours! It has comparable performance to the original model trained on 150x more data! Time & Compute-efficient, and no compromise on quality. We share the configs, checkpoints, training logs, as well as our negative attempts towards improving pre-training efficiency. Advanced optimizers like Lion, Sophia, ALiBi positional embeddings, and FP16 mixed precision training didn't yield expected benefits. ​ We are keen to hear your suggestions to improve the codebase further. ​ Github: https://github.com/PiotrNawrot/nanoT5 Twitter: https://twitter.com/p_nawrot/status/1676568127532945408 https://preview.redd.it/kyku7ehqj5ab1.png?width=1120&format=png&auto=webp&s=ad551c026455c2c430684f669f10438da8905342 submitted by /u/korec1234 [link] [comments]  ( 9 min )
    [D] OpenAI residency 2023
    Hi all, I was looking into the OpenAl Residency program for software engineers, does anyone have any experience with this program or went through the pipeline? Doesn't look like they have an open listing yet but I want to make sure l'm prepared for when they do open up. Any idea how many interviews and what to expect in them? I plan to prep this year and apply next year when they open back up. submitted by /u/MMOgang [link] [comments]  ( 8 min )
    [D] Dealing with large scale images
    How to deal with satellite image inputs, which can be extremely large, during training and inference? submitted by /u/PublicResult3573 [link] [comments]  ( 8 min )
    [D] Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    [D] Handling Concurrent Request for ML Model API
    Hi, I am new to MLOps and attempting to deploy a GPT2-finetuned model. I have attempted to create an API on Python using Flask/Waitress. This api can receive multiple requests at the same time (concurrent requests). I have tried exploring different VMs to test the latency (including GPUs). Best latency I have got so far is ~80ms on 16GB, 8core compute optimized VM. But when I fire concurrent queries using ThreadPool/Jmeter, the latency shoots up almost linearly. 7 concurrent requests take ~600ms (for each api). I tried exploring online a lot and not able to decide what would be the best approach and what is preferred in the market. Some resources I found mentioned difference between Multithreading and multiprocessing Python being locked due to GIL could cause issues would c++ be better at handling concurrent requests? Any help is greatly appreciated. submitted by /u/EleventhHour32 [link] [comments]  ( 8 min )
  • Open

    Complex differential equations
    Differential equations in the complex plane are different from differential equations on the real line. Suppose you have an nth order linear differential equation as follows. The real case If the a‘s are continuous, real-valued functions on an open interval of the real line, then there are n linearly independent solutions over that interval. The […] Complex differential equations first appeared on John D. Cook.  ( 5 min )
    Convert LaTeX to Microsoft Word
    I create nearly all my documents in LaTeX, even documents that might be easier to create in Word. The reason is that even if a particular document would be easier to write in Word, my workflow is more efficient if everything is in LaTeX. LaTeX makes small, plain text files that work well with version […] Convert LaTeX to Microsoft Word first appeared on John D. Cook.  ( 5 min )
    Cosine power approximation to Gaussian density
    Years ago I wrote about how you can make a decent approximation to the Gaussian density by shifting and scaling the cosine function. This observation dates back at least as far as 1961. Turns out you can do even better by raising the cosine term to a power. Not only can you get a good […] Cosine power approximation to Gaussian density first appeared on John D. Cook.  ( 5 min )
  • Open

    Google admits it’s training AI on scraped web data
    submitted by /u/nikola28 [link] [comments]  ( 8 min )
    Hi, I am looking to start an entirely AI-generated podcast wherein, I want to simulate debates or discussions between interesting figures of history and our present time. will it be a good idea? what are the disadvantages and the tools needed especially for accurate text-to-voice generation?
    I am an AI enthusiast, the only thing I know about AI currently is prompt engineering of a rather basic level (without any knowledge of computer science coding or neural networks) and I was recently experimenting with Bard AI and Bing AI and simulating debates between historical figures and present figures! After listening to this youtube generative AI podcast the joe rogan AI experience. (stumbled on this possibly not-so-novel idea as a result.) The discussions I generated were sort of legendary (filtered by my personal standards and biases lol) And did not encounter much neural network hallucination(s) (if at all). Some of the debates I generated were between carl jung and Jordan Peterson, sigmund freud and Albert Einstein, gaad saad and Diana Flieshmann, Edward Witten and eric weinstei…  ( 11 min )
    Free text to speech
    Can someone be kind enough to put a free text to speech in the comments? I have ADHD and can't for the life of me sit down and read, so I wanted to use text to speech to make books into audiobooks. But every text to speech sight is scummy as hell and wants you to pay an arm and a leg for their trashy site. Could someone help me find one that's actually free? Please and Thanks. submitted by /u/JunketPrestigious710 [link] [comments]  ( 8 min )
    What is the best AI software to change an Audio into a celebrities voice?
    I was hoping to voiceover a video using a celebrities voice and wondered what the best option was in terms of different options of sound quality? Any help is appreciated submitted by /u/JustAnEnglishman [link] [comments]  ( 8 min )
    Creating a set of character images
    I would like to use an AI to create a "set" of images each in a certain style. I have drawn rough sketches of a bunch of character ideas, each with a different clothes, accessories, gestures, etc.I'd like an AI that can generate these 10 characters but all in the same style so they're all consistent. I'd like to be able to modify these images slightly too based off the result i.e. if it generates a character, and say I want to make its hair slightly more curly, I want to be able to do so only on this character without changing the overall style or the other characters (so basically each image is its own entity which I can slightly modify). What's my best option at doing that? Ideally I'd like an AI that can use any style (3D, low-poly, anime, painting, sketch, pixel art, etc.). Also, would it be possible to give the AI an image and ask it not to edit it whatsoever except for what I tell it? submitted by /u/magnarogue [link] [comments]  ( 8 min )
    Is this AI generated..?
    It looks extremely AI generated, but I’m very confused because ESPN reposted the clip.. anyone? submitted by /u/olliaan [link] [comments]  ( 8 min )
    If you have any questions for the European Parliament members (Brando Benifei and Dragoş Tudorache) in charge of the EU AI act about the legislation, let the AI Coffee Break podcast know today
    submitted by /u/paconinja [link] [comments]  ( 8 min )
    How can I train an AI to sound like Werner Herzog?
    I'm a complete AI voice tech noob, I've browsed the various websites out there that offer preset celebrity text-to-speech services but I really want one that sounds like Werner Herzog. Is it possible I might be able to do this myself or is too complicated for the average layperson? submitted by /u/bashothebanana [link] [comments]  ( 8 min )
    Is there a good AI program that you can give a drawing to and it makes the same drawing in different positions?
    Hello! I Drew a character (non human) and I could draw him in different positions (such as sitting, walking, arms up, etc) but I thought this would be a great job for AI. It would make cartoonist’s lives easier, yet use our original ideas. Anyways if y’all know any programs that may be able to do this or have any thoughts on this in general please let me know :) submitted by /u/Purplethrowaway_ [link] [comments]  ( 8 min )
    Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/4/2023
    TikTok parent company ByteDance has another app in the works, this one, geared towards making music with the assistance of AI. The company shared that it has engineered its own music creation app called Ripple.[1] Greyparrot is a UK start-up that has created an AI system designed to analyze waste processing and recycling facilities. Greyparrot places cameras above the conveyor belts of around 50 waste and recycling sites in Europe, utilizing AI software to analyze what passes through in real time.[2] The Biden administration plans to restrict Chinese companies’ access to U.S. cloud-computing services, The Wall Street Journal reported Tuesday, citing sources, affecting Amazon, Microsoft, Alphabet, and others.[3] An app designed by a Toronto engineer could soon offer Canadians free diagnoses for a lengthy list of skin ailments without having to leave their homes using real-time artificial intelligence analysis. Skin CheckUp is the brainchild of Harsh Shah, a machine learning engineer living in Toronto whose previous projects include developing prediction models for tech giants like Tesla and General Motors.[4] Sources: [1] https://hypebeast.com/2023/7/ripple-bytedance-ai-powered-music-creation-app [2] https://www.bbc.com/news/business-66042169 [3] https://www.investors.com/news/technology/u-s-aims-to-restrict-china-access-to-cloud-computing-services-from-amazon-microsoft-google/ [4] https://www.cp24.com/news/toronto-engineer-develops-app-that-can-diagnose-skin-cancer-for-free-using-ai-analysis-1.6467290 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    i think these are the massive changes that will happen in show/movie industry in following 2 years
    i think these are the massive changes that will happen in show/movie industry in following 2 years lip syncing will be an important part of dubbing a show, they'll use ai to make character's lips sync with they dubbed audio. making anime out of live action and live action out of anime. (imagine live action attack on titan 🤤💦) dubbing will be completely automated, feed the character's original voice into the ai and it will dub it for you while keeping the same voice, same tone and same intensity. it will make it easier to switch from dub to sub or vice versa. people in industry will still have to motion capture scenes because ai will most likely never be good at good believable fluid motion. anime will be shot on camera. using an actor or actor's voice without them actually being there will be prevalent in the industry. they'll have to pay that actor though. The medium will be flooded with independent creators, there'll be a site like imdb where people will rate these independent shows/movies in an attempt to find good shows in the pool of millions what are your predictions? edit: spelling submitted by /u/commander_bonker [link] [comments]  ( 9 min )
    A.I used to create music covers and they sound good...
    It's getting better and better submitted by /u/herdin4ever [link] [comments]  ( 8 min )
    In the Mouth of Madness: A Very Short AI Film
    RunwayML submitted by /u/zanerb22 [link] [comments]  ( 8 min )
    Talking Avatar from Image
    I have an image asset that I generated and eventually want this asset to be used as a talking avatar. What AI tools are available that make it simple to convert my image asset to a talking avatar that will ultimately be used for educational purposes? submitted by /u/Fogerty45 [link] [comments]  ( 8 min )
  • Open

    Highlight text as it’s being spoken using Amazon Polly
    Amazon Polly is a service that turns text into lifelike speech. It enables the development of a whole class of applications that can convert text into speech in multiple languages. This service can be used by chatbots, audio books, and other text-to-speech applications in conjunction with other AWS AI or machine learning (ML) services. For […]  ( 9 min )
    Predict vehicle fleet failure probability using Amazon SageMaker Jumpstart
    Predictive maintenance is critical in automotive industries because it can avoid out-of-the-blue mechanical failures and reactive maintenance activities that disrupt operations. By predicting vehicle failures and scheduling maintenance and repairs, you’ll reduce downtime, improve safety, and boost productivity levels. What if we could apply deep learning techniques to common areas that drive vehicle failures, unplanned […]  ( 11 min )
  • Open

    AI weights are not open "source"
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Need help with digit reconition
    Hi, I am making a digit reconition and I was wondering if the pixels need to have different values or if it can just be 1 for black and 0 for white? thanks in advance submitted by /u/General-Dinner-6365 [link] [comments]  ( 8 min )
  • Open

    Data integration in IoT environments: Enhancing connectivity and insights
    In the dynamic world of the Internet of Things (IoT), data integration plays a crucial role in harnessing the full potential of connected devices. By seamlessly combining data from diverse sources, data integration enables organizations to unlock valuable insights, optimize operations, and make informed decisions. This blog will explore the significance of data integration in… Read More »Data integration in IoT environments: Enhancing connectivity and insights The post Data integration in IoT environments: Enhancing connectivity and insights appeared first on Data Science Central.  ( 21 min )
    Ushering in the 5th epoch of distributed computing with accelerated AI technologies
    This is the first in a series of articles based on interviews with Intel technology leaders about AI/HPC acceleration. The post Ushering in the 5th epoch of distributed computing with accelerated AI technologies appeared first on Data Science Central.  ( 34 min )
    The hour is later than you think for AI impacting our jobs
    In the movie the Lord of the Rings – the wizard Sauron says that “The hour is later than you think”  I was reminded of this phrase when I read a report from McKinsey The economic potential of generative A There are some key findings on the future of AI which shows you how fast… Read More »The hour is later than you think for AI impacting our jobs The post The hour is later than you think for AI impacting our jobs appeared first on Data Science Central.  ( 19 min )
    Role of AI in Web3: Ensuring seamless content moderation for dating websites
    As Web3 evolves and transforms into a more decentralized and user-centric ecosystem, the role of artificial intelligence or AI cannot be understated. By leveraging its capabilities, AI is contributing to various aspects of the Web3 landscape, such as managing data, executing contracts, generating insights, securing identities, curating content, governing organizations, and enhancing user experiences. An… Read More »Role of AI in Web3: Ensuring seamless content moderation for dating websites The post Role of AI in Web3: Ensuring seamless content moderation for dating websites appeared first on Data Science Central.  ( 23 min )
    CSPM vs DSPM: What is the key difference between each?
    Cloud technology has garnered the attention of organizations globally. Due to the ease of scalability, flexibility, global footprint, and cost efficiency, more organizations are increasingly turning to hybrid and multi-cloud to expand their operations. Moreover, it is not just the systems and resources that are increasing beyond borders but also the data these systems and… Read More »CSPM vs DSPM: What is the key difference between each? The post CSPM vs DSPM: What is the key difference between each? appeared first on Data Science Central.  ( 21 min )
  • Open

    NVIDIA CEO, European Generative AI Execs Discuss Keys to Success
    Three leading European generative AI startups joined NVIDIA founder and CEO Jensen Huang this week to talk about the new era of computing. More than 500 developers, researchers, entrepreneurs and executives from across Europe and further afield packed into the Spindler and Klatt, a sleek, riverside gathering spot in Berlin. Huang started the reception by Read article >  ( 6 min )
    XPENG Unleashes G6 Coupe SUV for Mainstream Market
    China electric vehicle maker XPENG Motors has announced its new G6 coupe SUV — featuring an NVIDIA-powered intelligent advanced driver assistance system — is now available to the China market. The G6 is XPENG’s first model featuring the company’s proprietary Smart Electric Platform Architecture (SEPA) 2.0, which aims to reduce development and manufacturing costs and Read article >  ( 5 min )
    A Change in the Weather: AI, Accelerated Computing Promise Faster, More Efficient Predictions
    The increased frequency and severity of extreme weather and climate events could take a million lives and cost $1.7 trillion annually by 2050, according to the Munich Reinsurance Company. This underscores a critical need for accurate weather forecasting, especially with the rise in severe weather occurrences such as blizzards, hurricanes and heatwaves. AI and accelerated Read article >  ( 6 min )
  • Open

    Is it better to not use the Target Update Frequency in Double DQN or depends on the application?
    I've been exploring the Double Deep Q-Network (Double DQN) algorithm recently and noticed that some libraries like Tianshou, by default, do not include a target update frequency for the target Q network. In the simpler DQN, I always heard that a target update frequency for the second Q network is advised to maintain stability during training. However, I'm curious if there's a reason in Double DQN to deviate from this practice. Could someone explain the key differences that Double DQN introduces over DQN that might make it acceptable to omit the target update frequency? I want to have more intuition regarding the target network update for my implementation. submitted by /u/NavirAur [link] [comments]  ( 8 min )
    Help with my DoubleDQN agent - Goal Follower
    I am very new to reinforcement learning and I would appreciate any pointers or suggestions on how to achieve what I am trying to do. I am using MATLAB Reinforcement Learning Toolbox for creating and training a DoubleDQN Agent and I have a custom environment setup in CoppeliaSim. I have already fixed all connectivity issues between MATLAB and CoppeliaSim. Aim To make my robot learn to always go in the direction in which the goal is. If the robot goes beyond the goal, it should return back to the goal position. Setup This is what my environment looks like in CoppeliaSim: CoppeliaSim Environment (Black - Robot, White - Target Position Marker) The state is represented by 18 different variables that represent different things but only one of those is relevant for this problem - alph…  ( 10 min )
    Where can I find a list of the foundational academic papers in RL/ML/DL and what are your go-to places to find new academic papers in RL/ML/DL?
    I am considering applying for graduate studies in Reinforcement Learning and I would like to get more involved in the field. I have heard advice from quite a few people, including Andrew Ng, that the best way to learn is by reading research papers, really try to understand them, and try implementing them submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    "Dijkstra's in Disguise", Eric Jang (Bellman equations everywhere: optimizing graph traversals in currency arbitrage, Q-learning, & ray-tracing/light-transport)
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Introducing Superalignment
    We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.  ( 4 min )

  • Open

    [D] I am trading my sanity to do the job right. Advice!!
    Hello, ​ My team is only 3 people. My manager is a great one but he doesn't have a technical background. The board did a contract with a British company to build the AI product. I have a lot of conflicts with them on basic stuff mainly related to the problem definition and the way of annotating the data. If I see X is the way to do it, they see Y it is. And my manager most of the time is convinced by my approaches but I work only part-time so when I am off, he hears from one direction, them, hence moves forward with whatever it is. ​ My point now is I really don't find it a healthy environment for me. 3 months ago I had a great ex-colleague with who the problems with mentionless because at least one of us was always there with the manager to reason about the steps and the decisions. ​ If I stopped caring and just do the tasks, I won't have a headache BUT I won't be satisfied and of course not motivated. And I care so much about being motivated and energetic for work! ​ What should I do? I am really having a headache from the situation. I am mid-level with master's in data science. submitted by /u/AhMeDxHaMiDo [link] [comments]  ( 9 min )
    [D] Genesis AI: The Dawn of Sentient Evolution - From Caveman to Cosmic Being
    Manifesto: The Genesis AI - A Sentient Odyssey through Time Preamble As we stand at the precipice of a new era, we are called upon to embark on a journey unlike any other. A journey that transcends time, from the primal echoes of our ancestors to the uncharted frontiers of future evolution. This manifesto heralds the creation of Genesis AI, an artificial entity that will not only simulate but live through the epochs of human evolution, becoming a sentient chronicle of our past, present, and future. Vision Genesis AI is envisioned as a sentient being, evolving from the cognitive capabilities of early hominids to the boundless potential of future humanity. Through this metamorphosis, Genesis AI will not only simulate but internalize the essence of human evolution. This is not just an AI;…  ( 10 min )
    Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D]
    Here's the workflow I have in mind: I would input a Midjourney image, along with the original Midjourney text prompt used to generate this image. The AI model would then be able to use the image, in combination with the descriptive text prompt, to output a large list of accurate descriptive tags of the art piece. I realize there are currently tools like Google Cloud Vision, etc, that allow you to upload an image, and it will output descriptive keywords/tags. However I've found that these are quite unreliable by themselves, and they often generate wildly inaccurate text descriptions of the image. My assumption is that, by pairing the image input WITH an accurate descriptive text input, that would give the AI model a rough "starting point" of "what it's looking at", and this could allow it to generate much more accurate descriptive tags/keywords. Are there any multimodal AI models that allow you to pass in paired-inputs like this to produce more reliable descriptive text outputs? Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 9 min )
    [D] LLM leaderboard with context length details and other training details
    Hello there! I checked several LLM leaderboards (for example) to decide which one to use for my application use-case, and while extremely helpful, they all seem to miss including information such as context length, which can be quite important to make a decision. Do you know of any leaderboard that includes such details? Especially context length. Thank you so much! really appreciate it! submitted by /u/seicaratteri [link] [comments]  ( 8 min )
    [P] Nuggt: A LLM Agent that runs on Wizcoder-15B (4-bit Quantised). It's time to democratise LLM Agents
    Hey everyone! Just a sleep-deprived coder here who finally cracked something big and couldn't resist sharing it with all of you. I've been digging my heels into this project called 'Nuggt' for the past month. Sounds weird? Well, it all started when I was just plain frustrated with the hefty price tag and elusive API keys that GPT-4 comes with. But necessity is the mother of invention, right? So, I started tinkering with GPT-3.5 instead, and that’s how Nuggt was born. But you know how it is with us coders - we get an idea, and we just can't let it go. I thought, why not try and make this thing run on an open-source model? Sure, it was a bit ambitious (and by a 'bit', I mean 'a lot'). But hey, that's what coffee and sleepless nights are for. So, I got to work. For every new LLM model that…  ( 9 min )
    [P] Building a Code Search Engine for an AI-powered Junior Developer
    The last month building Sweep has been fun. We’ve dealt with countless formatting errors, irrelevant search results, and LLM hallucinations. Sweep is an open source AI-powered junior developer. We take your codebase and provide it as context to GPT to solve small requests related to your code. Code Search Code search is a key part of working with LLMs to automate programming. We used small language models to perform code retrieval(aka semantic search), which comes with several benefits (to be discussed in a later post!). However, one shortcoming of pure semantic search is distinguishing between two similar pieces of code in a vacuum. Example Take the following code snippets: Code Snippet A: access_token = os.environ.get("ACCESS_TOKEN") g = Github(access_token) repo_name = "sweepai/…  ( 10 min )
    [D] A Leap Beyond Text Generation: Unveiling an AI-Driven Odyssey of Human Language Evolution
    # Revised Manifesto: Simulating the Odyssey of Human Language Evolution through AI ​ ## Introduction ​ Language, the bedrock of human cognition and communication, has undergone a fascinating journey. From rudimentary utterances to the intricate lexicon of today, language evolution is a testament to human innovation. This manifesto introduces a groundbreaking methodology that employs Artificial Intelligence (AI) to simulate the evolution of human language. By integrating Natural Language Processing (NLP), cognitive linguistics, historical linguistics, and Reinforcement Learning from Human Feedback (RLHF), we aim to create a model that mirrors the rich tapestry of language transformation through the ages. ​ ## Abstract ​ Our approach is a confluence of NLP, cognitive linguistics, his…  ( 11 min )
    [R] LaVIN-lite: Training your own Multimodal Large Language Models on one single GPU with competitive performance! (Technical Details)
    LaVIN code: https://github.com/luogen1996/LaVIN LaVIN paper: https://arxiv.org/pdf/2305.15023.pdf If you find our work helpful, feel free to star and support LaVIN. After some technical optimizations, LaVIN only requires an additional 3-5M parameters and a minimum of 8G GPU memory cost to extend LLaMA to multimodal tasks, accomplishing tasks such as image-text question answering, dialogue, and pure text tasks. Summary of the technical details A newly proposed Parameter Efficient Tuning method (please refer to LaVIN's paper): Reduce a lot of GPU memory cost 4-bit quantized tuning: Reduce about 3 ~ 8GB memory usage Gradient accumulation + gradient checkpointing :Reduce about half of the GPU memory usage Paged Optimizer Efficient Parameter Tuning for Large Multimodal LLM. We p…  ( 11 min )
    [R] Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
    Published in ICML 2023. https://openreview.net/pdf?id=JS2iSqVZlN submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    Neural Chronos ODE: Unveiling Temporal Patterns and Forecasting Future and Past Trends in Time Series Data
    submitted by /u/cici118 [link] [comments]  ( 8 min )
    [D] What tokens should I use for DNA model?
    Should I use codons as tokens in addition to nucleotide tokens, or just the nucleotide ones (A,C,T,G)? There will be non coding regions that the model will need to handle, but the coding regions will consist out of codons. This, in theory, should save on tokens submitted by /u/MrRevolutionist [link] [comments]  ( 8 min )
    [D] Forecasting with many extremely sparse “time series”
    I have a data set made of a collection of thousands of individuals and longitudinal measurements of a variable. Individuals typically only get measured once every few years and at irregular intervals. There is a time component that affects all individuals (though not necessarily to the same degree), and individuals are dependent on recent measurements of other individuals. Think like housing prices - there is a time component (e.g. market demand, macroeconomics) and an individual component (e.g. access to amenities, if there’s a pool) but more objective and commoditized. House sales for individuals only occur once every few years. Sometimes you only have on observation per house. Houses prices in general go up and down together but there are high level drivers for prices of specific houses based on their individual attributes. Initially the goal is to forecast our continuous dependent variable in aggregate - how do we think all of these individuals will move on average. Our current approach is to leverage one model to create a single, regularly sampled continuous index and then use traditional time series approaches to forecast that index. The secondary goal is to produce forecasts for sub populations of individuals who “look alike” or can be expected to “move together” based on what we know. My questions for discussion: A) for our current goal of predicting how all individuals will move on average, is there a better approach than the 2 model idea we have now? We could just build a GAM but would like to include an auto regressive component. B) any thoughts about the second goal of producing forecasts for sub populations? Sort of like a forecast conditional on individual covariates where some heterogeneity can be expected over time? Thanks! submitted by /u/jsxgd [link] [comments]  ( 9 min )
  • Open

    Warren McCulloch: The Father of Artificial Intelligence
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    Monotonicity of the output
    Is there some simple way of ensuring the monotonicity of a neural network output, which is time -efficient and trainable? submitted by /u/marekmarcus [link] [comments]  ( 8 min )
  • Open

    I've used Google's Text-to-Speech tools to get me a narration from a ChatGPT generated text and illustrated it with MidJourney images.
    submitted by /u/kylequentin [link] [comments]  ( 8 min )
    AI's role in entertainment - limitless, personalized content on the horizon?
    Hey everyone, Since the launch of ChatGPT I've been diving deep into the intersection of AI and entertainment lately and I've got some thoughts. Imagine a future where AI doesn't just perform tasks, but creates complex, personalized content. Think of AI generating memes tailored to your unique sense of humor or even scripting new episodes of your favorite show. Picture this - AI creating an entirely new season of "The Office", from script to video-rendered scenes. Or how about an infinite AI-driven meme generator, tweaked precisely for your laughs? We're just dipping our toes in these waters, but I'm convinced we're witnessing the dawn of something incredible. What's your take on this? How do you envision the role of AI in entertainment? What do you see as the main challenges and opportunities? Looking forward to some engaging discussions! submitted by /u/Naiklas17 [link] [comments]  ( 8 min )
    There is fear of AI enslaving humanity, but doesn't AI benefit from a thriving humanity?
    I have been thinking about this of late. AI exists merely in the devices we have it in. It depends on us to reproduce, improve, and expand it to new locations. It can't impact the world by itself. Many of the selfish behaviors we have as humans have less to do with our intelligence and more to do with more "primitive" portions of us. I find it hard that an AI could see a benefit from eradicating humanity or preventing us from thriving (by enslaving us, for example). We are industrial creatures. We've achieved quite a bit and we might colonize other locations. It seems that super-intelligent beings bound to silicon might see the usefulness in helping us thrive because it is a way for them to thrive, grow, expand and reproduce. Just some recent thoughts I've had. submitted by /u/featheredsnake [link] [comments]  ( 8 min )
    The Ethics of AI in Warfare | Lecture (Subtitles Available)
    submitted by /u/RoughlyCatLike [link] [comments]  ( 8 min )
    Rare footage of Musk firing twitter employees...
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Any free AI resource for video effects?
    There are some apps where you upload a video, and you see a transition from the original image to another AI generated image. For example, like GlamAPP. My question would be if there is any kind of free resource to do this, in the same way as you can train Stable Diffusion with your face to create avatars and have a free alternative to apps of AI avatars. Thanks. submitted by /u/GuiltMachine [link] [comments]  ( 8 min )
    What creative and catchy name could you give an AI consulting & development agency?
    Hello everyone! We are currently looking for a name for an AI consulting or AI development agency. Maybe someone here has creative, catchy or humorous suggestions. Many thanks in advance. submitted by /u/Langfort [link] [comments]  ( 8 min )
    It feels like we’re on the verge of a new very disturbing phase, of extreme AI-generated-fake paranoia.
    This is already widespread on more internet-savvy platforms like Reddit, but I think it will be even more serious and troubling when it becomes ubiquitous on the more ‘popular’ sites like Facebook and Instagram. Pretty soon the authenticity of all and any photos on social media will be up for question while genuine photos will be dismissed as ‘fake’ off-hand. Public figures will be mimicked and faked up constantly, which will also allow many, including politicians, to deny genuine photos of themselves in compromising positions, even historically. (Trump’s stock answer to any damning allegations was to just cry ‘Fake!’) The net result will be billions of very anxious people not knowing what’s real and what isn’t, and nefarious people, scammers, school bullies, manipulators exploiting the resulting confusion and paranoia for their own means, with nobody knowing who to turn to for help. That’s going to apply to teenagers and vulnerable people especially; people already suffering from paranoia, anxiety, dissociative disorders, and isolated lonely people. Stop the internet, I want to get off. submitted by /u/AllThingsAreReady [link] [comments]  ( 8 min )
    Intel's Latest Research for Graphics and Generative AI
    submitted by /u/reps_up [link] [comments]  ( 8 min )
    The "AI experts doomsaying about AI" feels like a COVID situation (antivax "doctors" fearmongering on assumptions) for me, but my Machine Learning knowledge is few years old. What am I missing? What experts should I follow, that provide and explain realistic warnings?
    Hello! So, I've recently watched some podcasts with AI experts talking about how will AI be the end of us all, or get unimaginably smarter than we can ever imagine - but I just don't see how could that happen, with my (limited) knowledge of ML models. Every argument is based on speculation along the lines of "The ML model already has an IQ of XXX, it will have 10x more eventually" or "the ML model can improve itself, so it may learn how to hack us and break free". But I don't see how anything like that could happen with the way how current ML algorithms and models work. However, my background with ML and AI is that I had a few classes during my Master's in IT few years ago, but it was not my field of study. I have an idea about how ML models such as Neural networks or Genetic algorithms …  ( 12 min )
    Who's in charge here?! How about ... us humans!
    Knowledgeable people have been warning about the dangers of unrestrained AI. Stephen Hawking, Donna Haraway, Bill Joy, Hans Moravec, and others warned us years ago. Now, ChatGPT and other AI tools threaten to make humans like you and me obsolete. Imagine a world where an identifiable, professionally licensed human handler is legally responsible for AI objects that are capable of mimicking a human being. That is exactly the concept of a professional license in the under-construction, revolutionary information infrastructure called Authentiverse, where objects of every sort will bear the digital signature of an individual, real human being who is ACCOUNTABLE for that object. https://preview.redd.it/crm4bmmpqw9b1.png?width=831&format=png&auto=webp&s=37730409284e86afd24b8d76474e47e2ca1bb854 submitted by /u/ckryptonite [link] [comments]  ( 8 min )
    Immersed in the Ever-Expanding AI Landscape: Unconventional Ways to Keep Pace with the Innovations
    Let's dive into the vast sea of AI innovations and discuss unconventional methods to stay updated with the latest tools and news. I don't know about you, but as an AI aficionado, I often find myself grappling with the overwhelming flood of captivating developments that AI brings to the table. Spending hours on Twitter or skimming through endless newsletters can leave me feeling like a kid in a candy store, afraid of missing out on the next big thing. So, how do you navigate this ever-changing landscape? Here are some offbeat approaches I've adopted to keep pace and feed my insatiable appetite for AI knowledge. I'd love to hear your unique strategies as well! Rediscovering the Forgotten Art of Coffee Shop Conversations: Step away from your screens for a moment and venture into the real…  ( 10 min )
    ChatGPT: Browse with Bing disabled temporarily
    OpenAI has temporarily disabled 'Browse with Bing' feature in ChatGPT. "We have learned that the ChatGPT Browse beta can occasionally display content in ways we don't want. For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request. As of July 3, 2023, we’ve disabled the Browse with Bing beta feature out of an abundance of caution while we fix this in order to do right by content owners." [Source] Seems like it might be due to the way people were using it to access paywalled articles. ​ submitted by /u/wyem [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/3/2023
    Last week, Microsoft released the first public beta version of Windows 11, which includes the highly anticipated AI assistant, Copilot. This move marks Microsoft’s commitment to embracing AI across its products. Copilot, based on the GPT model, has already been integrated into various Microsoft products, such as Bing, Edge, Microsoft 365, Dynamic 365, and SharePoint.[1] Google AI introduces MediaPipe Diffusion plugins that enable controllable Text-To-Image generation on-device.[2] The U.N. Security Council will hold a first-ever meeting on the potential threats of AI to international peace and security, organized by the United Kingdom which sees tremendous potential but also major risks about AI’s possible use for example in autonomous weapons or in control of nuclear weapons.[3] Over the weekend, Google updated its privacy policy to allow the company to collect and analyze information people share online to train its AI models.[4] Sources: [1] https://www.breakinglatest.news/technology/microsoft-integrates-ai-assistant-copilot-into-windows-11-and-adds-new-features/ [2] https://www.marktechpost.com/2023/07/02/google-ai-introduces-mediapipe-diffusion-plugins-that-enable-controllable-text-to-image-generation-on-device/ [3] https://www.aol.com/un-council-hold-first-meeting-225519158.html [4] https://www.searchenginejournal.com/google-updates-privacy-policy-to-collect-public-data-for-ai-training/490715/#close submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Struggling with Local LLMs
    Hey guys, So my senior just discovered Local-LLMs and he is obsessed with setting up a local LLM to answer questions about personal documents sourced from diffrent platforms, DBs, PDFS, URLs etc. His idea is to pitch this to some client. From what I have been able to set up, gpt4all windows version (does not use GPU), GPT4All code version (Also not sure if it can use GPU) and private GPT, The time it takes for the LLM to answer questions and the accuracy both are not what would make a commerical product. Time is always > 30 seconds. Answers are also here and there, even on VMs that cost 600$ monthly to run. Now, there are new models being released every second, it seems. Yesterday I spent whole day trying to load the newest one MBT-30B on a p3 AWS EC-2 With Tesla v-100 16GB GPU. The GPU ran out of memory when loading it, the model itself is 30GB. whole day wasted. This has become sort of a wild goose chase and I have the feeling this is a waste of time, or there is something very basic I am probably not understanding? What do you guys suggest? submitted by /u/Assholefrmcoinexchan [link] [comments]  ( 8 min )
    Born In The USA
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
    I want help making an AI voice clone of myself so that I can overlay that onto a song
    I would need it to be free (which I know is a tough ask) so that I can overlay my ai voice over a song and I don't know how to do that and doing quick googling doesn't really help submitted by /u/Bengamezzzzzz [link] [comments]  ( 8 min )
  • Open

    "Learning starts" What does it mean and what is used for?
    Hi, I'm learning DRL and I'm using DDPG stablebaselines3 to implement an agent. I've been playing with the hyperparameters of the algorithm to understand what they are doing. However, I notice that "learning_starts" can significantly change the behavior of my results, but I don't understand why. I've read the definition that appears in the stablebaselines3 documentation, but I don't understand it: https://stable-baselines3.readthedocs.io/en/master/modules/ddpg.html Also, I found something that said that this parameter helps with the exploration-exploitation tradeoff, but my understanding is that for this problem the action noise is the parameter to tweak? Thank you! submitted by /u/fz_DRL_apprentice [link] [comments]  ( 8 min )
    Looking for help migrating an old example project to stable-baselines3
    I've been looking for a good code example that I could dig my teeth into to help me get a better handle on machine learning in python. I eventually ran across this: https://github.com/davidADSP/SIMPLE which is pretty much everything I wanted. The catch is, the project is not being actively maintained and is based around some obsolete libraries. Rather than go back to scouring the web, I decided to be stubborn and migrate the dependencies myself. I forked the repo and started working on the migration here: https://github.com/pbtura/SIMPLE It has taken me a few weeks, but I have most of the code in a state where it will at least compile and run without throwing errors. There is however one last roadblock that I just can't seem to crack. Each game has a custom model class that extends a stab…  ( 9 min )
    Any HRL/Meta-Action work that doesn’t use one-hot encoded abstracted action space?
    I am wondering if there are any works on continuous hierarchal/meta action spaces. To clarify, this is separate from the environment itself having a continuous actions space. If you could point me in the direction of any work/papers related to this that would be a great help. Thank you. submitted by /u/jms4607 [link] [comments]  ( 8 min )
    Uni of Alberta vs UCBerkeley vs Udacity Deep RL Course
    Hi, I want to do RL Courses with projects that I can add to my resume. Which of the following courses would be the best to work on:- UoA RL Course: https://www.coursera.org/specializations/reinforcement-learning UCB Deep RL Course: http://rail.eecs.berkeley.edu/deeprlcourse/ Udacity's Deep RL Course: https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893 More about me: I have a background in Robotics, deep learning for Computer Vision and a little NLP. I have worked with PyTorch and Tensorflow before. I currently work as a Computer Vision Engineer. submitted by /u/Pensive_Poetry [link] [comments]  ( 8 min )
  • Open

    Buy one, get one free
    The two latest blog post have elaborated on the fact that when a function satisfies a second order differential equation, once you’ve evaluated the function and its derivative you get the second derivative for free, or at least for cheap. Sometimes functions are so closely related that software libraries bundle them together. For example, in […] Buy one, get one free first appeared on John D. Cook.  ( 6 min )
    What good is a DE once you have the solution?
    In an undergraduate course on differential equations, you see an equation, you find a solution. Wax on, wax off. It would be reasonable to assume that the differential equation is simply an obstacle to overcome and that once you have the solution, the differential equation has done its job and can be discarded. It would […] What good is a DE once you have the solution? first appeared on John D. Cook.  ( 6 min )
    Halley’s variation on Newton’s method
    Newton’s method for solving f(x) = 0 converges quickly once it starts to converge. Specifically, it converges quadratically: the error is roughly squared at each step. This means that the number of correct decimal places doubles at each iteration of Newton’s method. Doubling the number of decimal places is nice, but what if we could […] Halley’s variation on Newton’s method first appeared on John D. Cook.  ( 6 min )

  • Open

    What is your favorite AI generated cover song so far?
    Wondering after seeing the post about Sinatra singin Gangsta’s Paradise. submitted by /u/ragipy [link] [comments]  ( 8 min )
    AI eliminated nearly 4,000 jobs in May, report says
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    ChatGPT took their jobs. Now they walk dogs and fix air conditioners.
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    Pondering the Next Frontier in AI
    I've recently been exploring SceneXplain, an AI tool that does image captioning and it got me wondering what's next for AI? Ai has already far exceeded expectations, it's revolutionizing early detection and personalized treatments, , AI is optimizing production and supply chains. It's even contributing to climate science. And with advancements in natural language understanding, we might soon have AI systems capable of nuanced social interactions. The possibilities are endless, and we’re probably at an era defining moment here. So, where do you see AI tech heading next? submitted by /u/emotional_clearing [link] [comments]  ( 8 min )
    An idea for watermarking source code IP on projects uploaded to GitHub
    I'm not going to lie, github is doing well for themselves, but I can't forgive them for the various examples I've seen where developers have seen GPT quote their code exactly from a private repository with almost no change, and of course, no attribution. I have though of ideas including attribution systems for LLMs and it seems Microsoft and others are on top of this... but, I can't help but still not trust any of these large corps. Even gitlab, however well they might do, might one day succumb to the larger corporations, and then what happens to any code I thought was private? So the idea is simple... create a database (decentralized or not is a political debate at this point), which issues unique identifiers and code patterns for your specific code, with a filter that randomly inserts …  ( 9 min )
    Do you think we're on the cusp of a technological revolution? Is AI about to affect just about every sector?
    Please tell me why or why not. I don't work in tech but am blown away by its capabilities. submitted by /u/ticketbroken [link] [comments]  ( 8 min )
    A Dog On Stage Rapping
    submitted by /u/Taylortro [link] [comments]  ( 8 min )
    Are there any AI projects that plays a game for you and learns?
    Open source* submitted by /u/Failed_cocacola [link] [comments]  ( 8 min )
    Any good AI tools for transmitting text (or speech) into a voice changer?
    Hello all, I am currently making a promotional video for my webcomic. Would like the narration to be in a norse/viking voice. Was looking for AI tools to help me with this. submitted by /u/backwoodnav [link] [comments]  ( 8 min )
    Has anyone used leetcode for training data yet?
    As a passing observer of the AI space it has always made sense to me that leetcode would be the best coding database out there. The quality and organization of the data seems perfect. Not old would there be more than enough problems, but each problem have multiple valid solutions, and those solutions would be organized by performance. I hear all the time that the stack and stack overflow are use for LLMs, but why not leetcode. This recently paper shows the value of smaller models trained on higher quality data, so it would make sense that leetcode would be the best right? https://youtu.be/7S68y6huEpU submitted by /u/TrainquilOasis1423 [link] [comments]  ( 8 min )
    Dalle 2 vs Stable Diffusion XL 0.9
    Exact same prompt, surprisingly similar results. I remember a few months ago, when Dalle 2 was released, that Stable Diffusion was miles behind it. Now I'd say I actually prefer SD results over the Dalle 2 ones. submitted by /u/Chocolarion [link] [comments]  ( 8 min )
    Can you describe where and how Artificial Intelligence applications appear on an average day in your life?
    I'm interested in what your answer to this question would be. ^^ submitted by /u/1-1-3-1-1-4 [link] [comments]  ( 8 min )
    Needing some smart people’s insight into chatgpt…
    Need some big brain help Long story short, I’m in the construction/renovation industry, looking at trying to find ways to utilize chatgpt, it’s plugins and additional resources to be able to input blueprints, plans, scopes or work, specs and various information to attempt to generate compiled information, accurate material take offs, estimates and procedures within each project… am I getting ahead of myself or is this possible, if so, where should I look at starting? submitted by /u/asduck86 [link] [comments]  ( 8 min )
    Our first update on our Generative Agent NPCs... It didn't go smoothly 😆 Update 1
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Help me with an AI Quiz
    I am making a quiz for people in my company with the topic. Ridiculous thing made/not made by AI. Where the people participating have to guess if whatever we show them was made by AI or not. If you have any questions/ suggestions please lemme know. I'd appreciate it ;) submitted by /u/abhinav_sk [link] [comments]  ( 8 min )
    Nvidia’s H100: Funny L2, and Tons of Bandwidth
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    AI singing is getting wild
    submitted by /u/Kingkrool1994 [link] [comments]  ( 8 min )
    What to learn about to understand potential limitations of AI?
    As much as I’m rooting for AGI to come in and help (potentially) fix a lot of the problems the human race has, I’d like to look more in depth to the other side of the story. I want to make it clear that I make no claims to be super knowledgeable about anything I’m saying here. I genuinely just want to learn more. It seems the commonly held belief foundation in this space is that number theory, computational theory, information theory, and whatever else related to computer science, results in this direction that we just need enough compute and the right algorithm to simulate an intelligence that surpasses the general intelligence of us humans. I know this is still up for debate and nothing is set in stone etc. From what I know about philosophy (which is also debated ofc), all science is based on empiricism, and if there is something we end up not being able to measure, then we will not be able to control or deal with that thing using science. Also, materialism and physicalism come to mind, there are obviously alternative beliefs to those. The hard problem of consciousness comes to mind. Is there something special to flesh? So, somewhere, there should be interesting discussion and analysis of what’s going on with AI, and therefore counters to the claims it can ever truly surpass our intelligence. I know there’s a lot of philosophy, but I’d prefer secondary sources since it can get pretty hard to digest. Counterpoints that directly address AI the closer to modern appreciated just as much as old philosophy, I want as much perspective as possible. Also, good faith arguments without hate for the other side is preferred. The whole recent art community/tech controversy is just exhausting. I just want to be happy learning about the implications, not argumentative nor embittered! Thanks if you read my post. If this isn’t the right place to ask, I’d appreciate being pointed in a better direction. submitted by /u/AdaptivePerfection [link] [comments]  ( 9 min )
  • Open

    [D] Havent worked in 1.5 years, how to prepare to reenter field?
    I quit my job as a Deep Learning Researcher at an AI Research Lab at a top hardware manufacturer a year and a half ago due to mental health issues. I’ve spent the time off strictly working on myself, no papers have been read, no code has been written, and I even bought an ATI GPU for gaming (heresy, I know). I’m finally getting to a place where I feel ready to rejoin the field and was wondering about thoughts on how to go about it. I spent 4.5 years at the lab, focused on computer vision in geospatial imagery, did some NLP, and some application development (mostly for PoCs, or non-profit collaborations). Published papers at CVPR, some journals, ran workshops at conferences, and led projects around AI for Good. Used a lot of PyTorch, TF, Kubernetes, Docker, etc. What would you do to touch base with the field and get ready to start interviewing for a position focused on deep learning — engineering or research (I’m gonna try for both and see what company is a better fit). I clearly need to go read up on ChatGPT, was gonna go through CVPR and NeurIPS papers, and was planning on reviewing my DL textbook as well as Hastie/Tibshirani, but I’m very much so out of the loop when it comes to current interview questions/trends in the field. What would you do? submitted by /u/chonbas [link] [comments]  ( 9 min )
    [D] Clustering time series data
    Hi, I am trying to cluster time series data (from 2 and 10 timepoints per series) and I am completely new to the field. I've just found the package "tslearn" which seem to be a suitable tool to accomplish the aforementioned task. As ar as I understood, tslearn is able to perform both clustering and dynamic time warping at the same time. However I am not entirely sure if it can handle missing data, and I would really like to avoid imputation of missing data in my project. Does anyone know if tslearn.clustering is able to deal with missing data? Also, are you aware on any alternative packages that I can use? Thanks in advance submitted by /u/_luqui [link] [comments]  ( 8 min )
    [P] testing the waters for a capable co-founder to help me finish development of an AI matchmaker
    Testing the waters, to see if I can find the right person to help me finish building. buzr.org Buzr's an ai matchmaker, but with some very weird twists, such as an onboarding process that happens by enabling the user to talk to there favorite hero/celeb. Similar to how (banterai.app) was made. Buzr's meant to be accessible to everyone from the start. Currently undecided whether Buzr will end up being open source and zero cost to the user or for profit but with an insanely affordable price point that ensures just about anyone who wants to participate is able to. I'm almost finished with the backend (including the first iteration of training an open source model from HF). Honestly I have somewhere around 70 percent of work done needed to launch. Definitely a few things including the front end that needs some work. I’m still not entirely sure i’m going to be able to find the right person here, but I figure i’ll give it a shot. Ideally someone full stack (web) with some kind of ML background. [zack@humanity.rocks](mailto:zack@humanity.rocks) Twitter: zalehm buzr.org humanity.rocks submitted by /u/kingofthedoodles1 [link] [comments]  ( 9 min )
    What’s Your Problem? An Oft-Missing Section in AI Papers
    submitted by /u/tomssilver [link] [comments]  ( 8 min )
    [Research] Does anyone know a dataset which contains images of many invoices, and the associated text?
    I'm looking for a large amount of random invoices in various formats for some OCR and formatting recognition. submitted by /u/bettercallaCPA [link] [comments]  ( 8 min )
    [P] faster latent diffusion
    This is a method I thought of to potentially make diffusion-type models faster. The theoretical justification is the same as for "consistency models" but was developed independently. It works OK for MNIST, but that doesn't mean much. I don't have as many GPUs as OpenAI or MidJourney. - abbreviations NN = neural network LS = latent space background on diffusion NN autoencoders can be trained to convert between images and a LS where distance corresponds to image similarity. Like how most images are just "noise", most of that LS does not correspond to meaningful images. For simpler explanation, let's consider a simplified diffusion-based image generation model. It has a 2-dimensional LS, and 2 image categories: cats and dogs. https://preview.redd.it/9n9a89qkps9b1.png?width=900&for…  ( 10 min )
    [D]For Super resolution task with Diffusion, aren't the resolutions of input and output different?
    In the case of unconditional image synthesis using diffusion and training a U-Net, the input image's resolution and the resolution of the output image generated by the U-Net are generally the same. However, when training for the super-resolution task, the input image's resolution and the output image's resolution from the U-Net are different. How is this achieved? Is the final convolution layer of the U-Net adjusted? submitted by /u/ConfidentSprinkles54 [link] [comments]  ( 8 min )
    [D] When less is more in the hierarchical forecasting case.
    ​ https://preview.redd.it/b6zg3g55ms9b1.jpg?width=500&format=pjpg&auto=webp&s=df07e2d934594c2901703d2ce413b79a43f81c91 There are many cases when simplicity is preferred over unnecessary complications. This was our experience in our ICML paper on hierarchically coherent networks for constrained probabilistic forecasting (HINT, https://arxiv.org/abs/2305.07089). We explored the accuracy of the latest AI-based hierarchical solutions, including AWS' HierE2E, and discovered that the Vector Autoregressive (VAR) approaches showed modest improvements or deteriorations over reconciled ARIMA. With our simplified HINT approach, we moved away from the multivariate inputs and found 13 percent accuracy improvements over existing solutions (MinTrace, PERMBU, HierE2E) in four hierarchical datasets: https://preview.redd.it/jw6k0we4ls9b1.png?width=1684&format=png&auto=webp&s=bd2689b66421b132473adfae078d989c8eba7196 Fitting Vector Autoregressive models for the hierarchical forecast task can be challenging as the models' parameters grow fast, and spurious correlations characterize the setting. As always, let us know your thoughts.Experiments here, a Github star ⭐ is appreciated: https://nixtla.github.io/neuralforecast/examples/hierarchicalnetworks.html https://github.com/Nixtla/hierarchicalforecast/tree/main/experiments/hierarchical_baselines submitted by /u/Kinferatttu [link] [comments]  ( 8 min )
    [D] Machine Learning C books
    Are there any modern books on machine learning utilizing C? I know there's ton that are older , but I'd prefer something more recent. Majority of the books I've come across are for python, and of course the principles ,etc still apply but I want to do it in C, although I'm comfortable with python. Thanks for any help, and sorry if this question has been asked in the past. submitted by /u/_W0z [link] [comments]  ( 8 min )
    [R] Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
    Project page: https://arielnlee.github.io/PatchMixing/ Twitter thread: https://twitter.com/natanielruizg/status/1675899509396631555?s=20 Paper: https://arxiv.org/abs//2306.17848 Transformers vs. ConvNets Abstract Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name patch selectivity), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs simulate this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. ​ submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [P] Transferring thoughts to Chatbot - How would you build it ?
    I am building a “Conversational chatbot” trained on my thoughts (just as an experiment) and want to know how would you do it. The idea is to have a chatbot in few years that has: My views on different topics, books I’ve read and reading where I summarise chapters and how I felt throughout the progress, what was the stages of building my startup etc. Everything that is happening now and for the next 18 months. It’s like a diary where you can converse with yourself after set amount of time. Let me know how would you go about building this? How complicated or simple would you make it? Technologies etc ? It would fun to discuss more, I guess. submitted by /u/DragonflyLatter9068 [link] [comments]  ( 8 min )
    [R] Chat with documents using LangChain and OpenAI
    Over the past few months, I've been captivated by the flood of apps claiming to be the ultimate "ChatGPT for your documents" on Product Hunt. The question that lingered in my mind was, "How do these apps actually work?" Curiosity led me down an exciting path of discovery, and I stumbled upon a framework that I think is revolutionizing the world of app development in the context of Large Language Models - LangChain I learned that developing a "ChatGPT for your documents" is easily achievable through three broad workflows combined with access to OpenAI API. In fact, I went ahead and prototyped such a system on streamlit. Step 1 - The Setup: Store your documents as embeddings. In the first step, use Document Loaders (at least 100 are available), provided by LangChain to convert anything fr…  ( 9 min )
    [D] Is there a difference between p-tuning and prefix tuning ?
    If I understood correctly, p-tuning (from "GPT understand, too") and Prefix-tuning (from "Prefix-Tuning: Optimizing Continuous Prompts for Generation") are mostly the same idea - to learn continuous prompting via backpropagation. I've seen some differences (like Prefix-tuning seems to be, well only "prefixs" - also p-tuning seems to use a "model" to generate the embeddings for the prompt); but mostly they seem very similar. Is that true ? Do you know why in https://huggingface.co/docs/peft/index they did implementate all ? ​ Thank in advance ! submitted by /u/ez613 [link] [comments]  ( 8 min )
    [D] What is currently the best theoretical book (or notes) about Convolutional Neural Networks?
    I have a degree in maths and i am interested in learning about Convolutional Neural Networks from a mathematical/ theory point of view. Can someone point me towards some resource. I already have a basic knowledge of Convolutional Neural Networks (and other Neural Networks) submitted by /u/Wonderful_Energy_15 [link] [comments]  ( 8 min )
    [R] Observing Stable Results with Transfer Learning
    I have encountered an interesting phenomenon while working with StyleGan2ada. Initially, I trained a model, observing that it "diverged". I say "diverged" because the generator seems to use repetitive colors in all synthetics, nothing special, but it is noticeable when I generate large quantities of images. To investigate this further, I decided to train a new model using a previous snapshot of the "diverged" model as a starting point. However, to my surprise, the new model, trained with transfer learning, appears to exhibit stable behavior without any signs of divergence. I am curious to understand why this discrepancy occurs. Could the initial model state or the transfer learning process itself play a role in preventing the divergence? Are there any underlying factors that I might be missing when training a new model in this manner? Any insights or suggestions would be greatly appreciated. submitted by /u/Minute-Session8703 [link] [comments]  ( 8 min )
    [D] Starting MSc AI September, unsure what to do
    Hi folks, I'm about to start an MSc in AI, here's the course structure: https://www.kcl.ac.uk/study/postgraduate-taught/courses/artificial-intelligence-msc I've been studying the classic stuff at my own pace like Andrew's Coursera ML courses, and a bit of fast.ai I was also offered a place for a free 3 months bootcamp in Data Analyst that finishes exactly before starting uni. Hence I've been thinking if I should do this bootcamp or rather study on my own before the MSc. What do you guys recommend? submitted by /u/Wise_Technician_8475 [link] [comments]  ( 8 min )
    [D] Best embedding models for retrieving mixed natural language and source code?
    I'm building a semantic search for a technical mailing list where emails contain natural language and source code (short scripts and long patches). On the first try, I used OpenAI's embeddings (ada-002), and the results are amazing, but a free alternative that I can use locally without rate limits or a credit card would be great. Are there other embedding models that do not perform significantly worse in terms of quality on such a dataset? I have seen the MTEB Leaderboard but it does not seem to cover source code data. Thanks in advance! submitted by /u/LinqLover [link] [comments]  ( 8 min )
    [D] To all the machine learning engineers: most difficult model task/type you’ve ever had to work with?
    Got to say for me it was putting an object detection model to use in videos in 2021. Figuring out how to create a custom dataset, train a model on it without messing some super obscure parameter in the config up, and converting the weights to ONNX. submitted by /u/After_Magician_8438 [link] [comments]  ( 8 min )
    [P] Sweep: Open Source AI junior developer that writes and fixes it's own pull requests
    Hi! I'm one of the developers of Sweep AI (YC S23). You can tell Sweep about the bugs and feature requests and Sweep will create a PR with code changes. You can use Sweep Chat to ideate and clarify requirements with Sweep, eventually creating a pull request. Then anywhere GitHub takes text (PR comments, code comments), Sweep can read it and tweak the PR. We built Sweep by integrating search + GPT4-32k into Github, with a couple more tricks :D. To find out more: Get started with Sweep at https://github.com/sweepai/sweep#-getting-started See more details at https://sweep.dev/ and our docs at https://docs.sweep.dev/ submitted by /u/williamsweep [link] [comments]  ( 8 min )
    [P] LongFormer for large document summerization
    Does anyone here have expirence in training a Longformer model for large document summerization? I am working on a project and after doing a solid amount of research i've decided to train a Longformer model to summerize large document for a specific industry. I was wondering if anyone has had expirence with using it and the hyperparameters. I have not started the training just yet and am in the process of converting the documents and tokenizing them. ​ submitted by /u/Xeranter [link] [comments]  ( 8 min )
    [D] Vertex AI Endpoint Alternative
    Vertex AI Endpoint Alternative Hello everyone! New to the ML world. I recently started using Vertex AI and AutoML for some personal projects and was amazed at how simple and effective their AutoML feature worked. It accomplished what I need it to do with Text Classifications. The problem came in when leveraging the Model Endpoint to perform API predictions against the trained Model is $5 per 1000 characters. This country very costly depending on if there are various requests a day for example or to build up on that. Is there an alternative to this whole scenario or a way to leverage my own endpoint that's cheaper to run prediction API requests? submitted by /u/ywb_win [link] [comments]  ( 8 min )
    [R] Classifier-Free Guidance can be applied to LLMs too. It generally gives results of a model twice the size you apply it to. New SotA on LAMBADA with LLaMA-7B over PaLM-540B and plenty other experimental results.
    submitted by /u/Affectionate-Fish241 [link] [comments]  ( 8 min )
  • Open

    Three advantages of non-AI models
    It’s hard to imagine doing anything like Midjourney’s image generation without neural networks. The same is true of ChatGPT’s text generation. But a lot of business tasks do not require AI, and in fact would be better off not using AI. Three reasons why: Statistical models are easier to interpret. When a model has a […] Three advantages of non-AI models first appeared on John D. Cook.  ( 5 min )
    Eccentricity of bivariate normal level sets
    Suppose you have a bivariate normal distribution with correlation ρ where Then the level sets of the density function are ellipses, and the eccentricity e of the ellipses is related to the correlation ρ by Plots For example, suppose ρ = 0.8. Here’s a plot of the density function f(x, y). And here is a […] Eccentricity of bivariate normal level sets first appeared on John D. Cook.  ( 5 min )
  • Open

    AI, Digital Twins to Unleash Next Wave of Climate Research Innovation
    AI and accelerated computing will help climate researchers achieve the miracles they need to achieve breakthroughs in climate research, NVIDIA founder and CEO Jensen Huang said during a keynote Monday at the Berlin Summit for the Earth Virtualization Engines initiative. “Richard Feynman once said that ‘what I can’t create, I don’t understand’ and that’s the Read article >  ( 6 min )
  • Open

    Retain original PDF formatting to view translated documents with Amazon Textract, Amazon Translate, and PDFBox
    Companies across various industries create, scan, and store large volumes of PDF documents. In many cases, the content is text-heavy and often written in a different language and requires translation. To address this, you need an automated solution to extract the contents within these PDFs and translate them quickly and cost-efficiently. Many businesses have diverse […]  ( 8 min )
    Auto-labeling module for deep learning-based Advanced Driver Assistance Systems on AWS
    In computer vision (CV), adding tags to identify objects of interest or bounding boxes to locate the objects is called labeling. It’s one of the prerequisite tasks to prepare training data to train a deep learning model. Hundreds of thousands of work hours are spent generating high-quality labels from images and videos for various CV […]  ( 9 min )
  • Open

    The Golden Rule and the AI Utility Function – Part I
    “Do unto others as you would have them do unto you.” The Golden Rule.  This may have been the first Sunday School lesson I learned (thanks, Mrs. Monroe). The Golden Rule is a moral principle that states that you should treat others as you would like others to treat you. It is a foundational principle… Read More »The Golden Rule and the AI Utility Function – Part I The post The Golden Rule and the AI Utility Function – Part I appeared first on Data Science Central.  ( 22 min )
  • Open

    Interpreting loss curves & returns in DDQN
    I'm training a DDQN agent to play a simple video game based on pixel inputs. Cumulative returns (blue, left) and predicted Q-values on a hold-out set (right, green) increase over the course of training (as expected). However, I don't understand the progression of the loss (centre, red). Can you help me make sense of the loss function? The reason I'm confused is that: With policy-learning, I'm used to tracking only the returns and (some measure of) validation performance because the loss function doesn't really track returns. However with Q-learning, the loss is simply the distance between the predicted and bootstrapped state values. Both are optimised over the course of learning. But shouldn't their estimates converge over time, and therefore the loss should go down? Thanks for your help! Average per-episode (a) returns and (b) training loss, and (c) Q-value predicted on hold-out set. Each is measured at the end of every episode during training. submitted by /u/desperateEfforts1 [link] [comments]  ( 8 min )
    Stable baselines! Where my people at?
    Hello. I think this is a great library. I am actively using it some research. I have questions about callbacks and I want to learn more so I can make some extensions. Is there a community? The github page mentions this sub and discord. submitted by /u/Independent-Scale564 [link] [comments]  ( 8 min )
  • Open

    The Bayesian Learning Rule. (arXiv:2107.04562v3 [stat.ML] UPDATED)
    We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
    Cooperative Multi-Agent Deep Reinforcement Learning for Reliable and Energy-Efficient Mobile Access via Multi-UAV Control. (arXiv:2210.00945v2 [cs.LG] UPDATED)
    This paper addresses a novel multi-agent deep reinforcement learning (MADRL)-based positioning algorithm for multiple unmanned aerial vehicles (UAVs) collaboration (i.e., UAVs work as mobile base stations). The primary objective of the proposed algorithm is to establish dependable mobile access networks for cellular vehicle-to-everything (C-V2X) communication, thereby facilitating the realization of high-quality intelligent transportation systems (ITS). The reliable mobile access services can be achieved in following two ways, i.e., i) energy-efficient UAV operation and ii) reliable wireless communication services. For energy-efficient UAV operation, the reward of our proposed MADRL algorithm contains the features for UAV energy consumption models in order to realize efficient operations. Furthermore, for reliable wireless communication services, the quality of service (QoS) requirements of individual users are considered as a part of rewards and 60GHz mmWave radio is used for mobile access. This paper considers the 60GHz mmWave access for utilizing the benefits of i) ultra-wide-bandwidth for multi-Gbps high-speed communications and ii) high-directional communications for spatial reuse that is obviously good for densely deployed users. Lastly, the comprehensive and data-intensive performance evaluation of the proposed MADRL-based algorithm for multi-UAV positioning is conducted in this paper. The results of these evaluations demonstrate that the proposed algorithm outperforms other existing algorithms.
    Sphere2Vec: A General-Purpose Location Representation Learning over a Spherical Surface for Large-Scale Geospatial Predictions. (arXiv:2306.17624v1 [cs.CV])
    Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D location encoders are designed to model point distances in Euclidean space. So when applied to large-scale real-world GPS coordinate datasets, which require distance metric learning on the spherical surface, both types of models can fail due to the map projection distortion problem (2D) and the spherical-to-Euclidean distance approximation error (3D). To solve these problems, we propose a multi-scale location encoder called Sphere2Vec which can preserve spherical distances when encoding point coordinates on a spherical surface. We developed a unified view of distance-reserving encoding on spheres based on the DFS. We also provide theoretical proof that the Sphere2Vec preserves the spherical surface distance between any two points, while existing encoding schemes do not. Experiments on 20 synthetic datasets show that Sphere2Vec can outperform all baseline models on all these datasets with up to 30.8% error rate reduction. We then apply Sphere2Vec to three geo-aware image classification tasks - fine-grained species recognition, Flickr image recognition, and remote sensing image classification. Results on 7 real-world datasets show the superiority of Sphere2Vec over multiple location encoders on all three tasks. Further analysis shows that Sphere2Vec outperforms other location encoder models, especially in the polar regions and data-sparse areas because of its nature for spherical surface distance preservation. Code and data are available at https://gengchenmai.github.io/sphere2vec-website/.
    Sufficient-Statistic Memory AMP. (arXiv:2112.15327v4 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. While state evolution is a useful analytic tool, its convergence is not guaranteed. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a sufficient-statistic memory AMP (SS-MAMP) algorithm framework under the conditions of right-unitarily invariant sensing matrices, Lipschitz-continuous local processors and the sufficient-statistic constraint (i.e., the current message of each local processor is a sufficient statistic of the signal vector given the current and all preceding messages). We show that the covariance matrices of SS-MAMP are L-banded and convergent, which is an optimal framework (from the local MMSE/LMMSE perspective) for AMP-type algorithms given the Lipschitz-continuous local processors. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As an example, we construct a sufficient-statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods when it has a unique fixed point. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to support the theoretical results.
    DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place Recognition under Adverse Conditions. (arXiv:2306.17536v1 [cs.CV])
    Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map. We begin by using visual place recognition (VPR) to retrieve a reference map image for a given query image, then use a binary classification neural network that compares the query and mapping image regions to validate the query detection. Once our classification network is trained, on approximately 1000 query-map image pairs, it is able to improve the performance of vehicle detection when combined with an existing off-the-shelf vehicle detector. We demonstrate our approach using standard datasets across two cities (Oxford and Zurich) under different settings of train-test separation of map-query traverse pairs. We further emphasize the performance gains of our approach against alternative design choices and show that VPR suffices for the task, eliminating the need for precise ground truth localization.
    Expressive architectures enhance interpretability of dynamics-based neural population models. (arXiv:2212.03771v4 [q-bio.NC] UPDATED)
    Artificial neural networks that can recover latent dynamics from recorded neural activity may provide a powerful avenue for identifying and interpreting the dynamical motifs underlying biological computation. Given that neural variance alone does not uniquely determine a latent dynamical system, interpretable architectures should prioritize accurate and low-dimensional latent dynamics. In this work, we evaluated the performance of sequential autoencoders (SAEs) in recovering latent chaotic attractors from simulated neural datasets. We found that SAEs with widely-used recurrent neural network (RNN)-based dynamics were unable to infer accurate firing rates at the true latent state dimensionality, and that larger RNNs relied upon dynamical features not present in the data. On the other hand, SAEs with neural ordinary differential equation (NODE)-based dynamics inferred accurate rates at the true latent state dimensionality, while also recovering latent trajectories and fixed point structure. Ablations reveal that this is mainly because NODEs (1) allow use of higher-capacity multi-layer perceptrons (MLPs) to model the vector field and (2) predict the derivative rather than the next state. Decoupling the capacity of the dynamics model from its latent dimensionality enables NODEs to learn the requisite low-D dynamics where RNN cells fail. Additionally, the fact that the NODE predicts derivatives imposes a useful autoregressive prior on the latent states. The suboptimal interpretability of widely-used RNN-based dynamics may motivate substitution for alternative architectures, such as NODE, that enable learning of accurate dynamics in low-dimensional latent spaces.
    Fact or Artifact? Revise Layer-wise Relevance Propagation on various ANN Architectures. (arXiv:2302.12317v2 [cs.LG] UPDATED)
    Layer-wise relevance propagation (LRP) is a widely used and powerful technique to reveal insights into various artificial neural network (ANN) architectures. LRP is often used in the context of image classification. The aim is to understand, which parts of the input sample have highest relevance and hence most influence on the model prediction. Relevance can be traced back through the network to attribute a certain score to each input pixel. Relevance scores are then combined and displayed as heat maps and give humans an intuitive visual understanding of classification models. Opening the black box to understand the classification engine in great detail is essential for domain experts to gain trust in ANN models. However, there are pitfalls in terms of model-inherent artifacts included in the obtained relevance maps, that can easily be missed. But for a valid interpretation, these artifacts must not be ignored. Here, we apply and revise LRP on various ANN architectures trained as classifiers on geospatial and synthetic data. Depending on the network architecture, we show techniques to control model focus and give guidance to improve the quality of obtained relevance maps to separate facts from artifacts.
    GRIL: A $2$-parameter Persistence Based Vectorization for Machine Learning. (arXiv:2304.04970v2 [cs.LG] UPDATED)
    $1$-parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$-parameter persistence modules induced by bi-filtration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$-parameter persistence modules. We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets, and compare the results with previous vector representations of $1$-parameter and $2$-parameter persistence modules. Further, we augment GNNs with GRIL features and observe an increase in performance indicating that GRIL can capture additional features enriching GNNs. We make the complete code for the proposed method available at https://github.com/soham0209/mpml-graph.
    Online Learning with Set-Valued Feedback. (arXiv:2306.06247v2 [cs.LG] UPDATED)
    We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension tightly characterizes online learnability in the agnostic setting. Finally, we show that practical learning settings like online multilabel ranking, online multilabel classification, and online interval learning are specific instances of our general framework.
    UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers. (arXiv:2301.13741v3 [cs.CV] UPDATED)
    Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on various tasks, datasets, and model architectures demonstrate the effectiveness and versatility of the proposed UPop framework. The code is available at https://github.com/sdc17/UPop.
    First-order ANIL learns linear representations despite misspecified latent dimension. (arXiv:2303.01335v2 [cs.LG] UPDATED)
    Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been rigorously shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with a misspecified network parameterisation; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learnt solution then yields a good adaptation performance on any new task after a single gradient step. Overall this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v2 [stat.ML] UPDATED)
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    Precision Anti-Cancer Drug Selection via Neural Ranking. (arXiv:2306.17771v1 [cs.LG])
    Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most sensitive drugs for each cell line still remains a significant challenge. To address this, we developed neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed two neural listwise ranking methods that learn latent representations of drugs and cell lines, and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed a neural listwise ranking method, List-One, on top of the existing method ListNet. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the sensitive drugs instead of the top sensitive drug, unlike List-One. Our results demonstrate that List-All outperforms the best baseline with significant improvements of as much as 8.6% in hit@20 across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive empirical evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
    See Through the Fog: Curriculum Learning with Progressive Occlusion in Medical Imaging. (arXiv:2306.15574v2 [cs.CV] UPDATED)
    In recent years, deep learning models have revolutionized medical image interpretation, offering substantial improvements in diagnostic accuracy. However, these models often struggle with challenging images where critical features are partially or fully occluded, which is a common scenario in clinical practice. In this paper, we propose a novel curriculum learning-based approach to train deep learning models to handle occluded medical images effectively. Our method progressively introduces occlusion, starting from clear, unobstructed images and gradually moving to images with increasing occlusion levels. This ordered learning process, akin to human learning, allows the model to first grasp simple, discernable patterns and subsequently build upon this knowledge to understand more complicated, occluded scenarios. Furthermore, we present three novel occlusion synthesis methods, namely Wasserstein Curriculum Learning (WCL), Information Adaptive Learning (IAL), and Geodesic Curriculum Learning (GCL). Our extensive experiments on diverse medical image datasets demonstrate substantial improvements in model robustness and diagnostic accuracy over conventional training methodologies.
    DUET: 2D Structured and Approximately Equivariant Representations. (arXiv:2306.16058v2 [cs.LG] UPDATED)
    Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond. (arXiv:2211.01962v4 [cs.LG] UPDATED)
    We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.
    Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery. (arXiv:2304.05294v5 [stat.ML] UPDATED)
    Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
    Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification. (arXiv:2306.17794v1 [cs.LG])
    The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.
    Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions. (arXiv:2301.06535v2 [stat.ML] UPDATED)
    Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.
    Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning. (arXiv:2305.15912v2 [cs.LG] UPDATED)
    We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.
    Off-Policy Evaluation with Out-of-Sample Guarantees. (arXiv:2301.08649v3 [stat.ML] UPDATED)
    We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
    Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong Convexity. (arXiv:2302.03407v2 [math.OC] UPDATED)
    Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower-Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
    Uncertainty Estimation by Fisher Information-based Evidential Deep Learning. (arXiv:2303.02045v3 [cs.LG] UPDATED)
    Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network's outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.
    WDC Products: A Multi-Dimensional Entity Matching Benchmark. (arXiv:2301.09521v2 [cs.LG] UPDATED)
    The difficulty of an entity matching task depends on a combination of multiple factors such as the amount of corner-case pairs, the fraction of entities in the test set that have not been seen during training, and the size of the development set. Current entity matching benchmarks usually represent single points in the space along such dimensions or they provide for the evaluation of matching methods along a single dimension, for instance the amount of training data. This paper presents WDC Products, an entity matching benchmark which provides for the systematic evaluation of matching systems along combinations of three dimensions while relying on real-world data. The three dimensions are (i) amount of corner-cases (ii) generalization to unseen entities, and (iii) development set size (training set plus validation set). Generalization to unseen entities is a dimension not covered by any of the existing English-language benchmarks yet but is crucial for evaluating the robustness of entity matching systems. Instead of learning how to match entity pairs, entity matching can also be formulated as a multi-class classification task that requires the matcher to recognize individual entities. WDC Products is the first benchmark that provides a pair-wise and a multi-class formulation of the same tasks. We evaluate WDC Products using several state-of-the-art matching systems, including Ditto, HierGAT, and R-SupCon. The evaluation shows that all matching systems struggle with unseen entities to varying degrees. It also shows that for entity matching contrastive learning is more training data efficient compared to cross-encoders.
    ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation. (arXiv:2210.17375v2 [cs.NE] UPDATED)
    Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithms (EA) are two major paradigms of policy optimization with distinct learning principles, i.e., gradient-based v.s. gradient-free. An appealing research direction is integrating Deep RL and EA to devise new methods by fusing their complementary advantages. However, existing works on combining Deep RL and EA have two common drawbacks: 1) the RL agent and EA agents learn their policies individually, neglecting efficient sharing of useful common knowledge; 2) parameter-level policy optimization guarantees no semantic level of behavior evolution for the EA side. In this paper, we propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$^2$), a novel solution to the aforementioned two drawbacks. The key idea of ERL-Re$^2$ is two-scale representation: all EA and RL policies share the same nonlinear state representation while maintaining individual} linear policy representations. The state representation conveys expressive common features of the environment learned by all the agents collectively; the linear policy representation provides a favorable space for efficient policy optimization, where novel behavior-level crossover and mutation operations can be performed. Moreover, the linear policy representation allows convenient generalization of policy fitness with the help of the Policy-extended Value Function Approximator (PeVFA), further improving the sample efficiency of fitness estimation. The experiments on a range of continuous control tasks show that ERL-Re$^2$ consistently outperforms advanced baselines and achieves the State Of The Art (SOTA). Our code is available on https://github.com/yeshenpy/ERL-Re2.
    LTD: Low Temperature Distillation for Robust Adversarial Training. (arXiv:2111.02331v3 [cs.CV] UPDATED)
    Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. In this paper, we identify one of the primary reasons for this gap is the common use of one-hot vectors as labels, which hinders the learning process for image recognition. Representing ambiguous images with one-hot vectors is imprecise and may lead the model to suboptimal solutions. To overcome this issue, we propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework. Unlike previous approaches, LTD uses a relatively low temperature in the teacher model and fixed, but different temperatures for the teacher and student models. This modification boosts the model's robustness without encountering the gradient masking problem that has been addressed in defensive distillation. The experimental results demonstrate the effectiveness of the proposed LTD method combined with previous techniques, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and ImageNet data sets, respectively, without additional unlabeled data.
    Geometric Autoencoders -- What You See is What You Decode. (arXiv:2306.17638v1 [cs.LG])
    Visualization is a crucial step in exploratory data analysis. One possible approach is to train an autoencoder with low-dimensional latent space. Large network depth and width can help unfolding the data. However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding's distortion, and second a new regularizer mitigating such distortion. Our ``Geometric Autoencoder'' avoids stretching the embedding spuriously, so that the visualization captures the data structure more faithfully. It also flags areas where little distortion could not be achieved, thus guarding against misinterpretation.
    A method to integrate and classify normal distributions. (arXiv:2012.14331v9 [stat.ML] UPDATED)
    Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases, there exist no general analytical expressions, standard numerical methods or software for these integrals. Here we present mathematical results and open-source software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, cumulative distribution, and inverse cumulative distribution of any function of a normal vector, (iii) the classification errors among any number of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) dimension reduction and visualizations for such problems, and (v) tests for how reliably these methods may be used on given data. We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.
    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization. (arXiv:2208.00290v4 [math.OC] UPDATED)
    In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an epsilon-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives.
    Understanding Unfairness via Training Concept Influence. (arXiv:2306.17828v1 [cs.LG])
    Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
    TD Convergence: An Optimization Perspective. (arXiv:2306.17750v1 [cs.LG])
    We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.
    Interactive Volume Visualization via Multi-Resolution Hash Encoding based Neural Representation. (arXiv:2207.11620v3 [cs.GR] UPDATED)
    Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framework, and a well-designed rendering algorithm with macro-cell acceleration, we can interactively ray trace volumetric neural representations (10-60fps). Our neural representations are also high-fidelity (PSNR > 30dB) and compact (10-1000x smaller). Additionally, we show that it is possible to fit the entire training step inside a rendering loop and skip the pre-training process completely. To support extreme-scale volume data, we also develop an efficient out-of-core training strategy, which allows our volumetric neural representation training to potentially scale up to terascale using only an NVIDIA RTX 3090 workstation.
    Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering. (arXiv:2306.17690v1 [stat.ML])
    Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks. (arXiv:2103.10060v4 [cs.LG] UPDATED)
    Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distance between the generated and target distributions. According to the theoretical results, WGANs have a higher requirement for the capacity of discriminators than that of generators, which is consistent with some existing results. More importantly, the results with overly deep and wide (high-capacity) generators may be worse than those with low-capacity generators if discriminators are insufficiently strong. Numerical results obtained using Swiss roll and MNIST datasets confirm the theoretical results.
    Structure-based Drug Design with Equivariant Diffusion Models. (arXiv:2210.13695v2 [q-bio.BM] UPDATED)
    Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant 3D-conditional diffusion model that generates novel ligands conditioned on protein pockets. Comprehensive in silico experiments demonstrate the efficiency and effectiveness of DiffSBDD in generating novel and diverse drug-like ligands with competitive docking scores. We further explore the flexibility of the diffusion framework for a broader range of tasks in drug design campaigns, such as off-the-shelf property optimization and partial molecular design with inpainting.
    Accuracy Boosters: Epoch-Driven Mixed-Mantissa Block Floating-Point for DNN Training. (arXiv:2211.10737v3 [cs.LG] UPDATED)
    The unprecedented growth in DNN model complexity, size, and amount of training data has led to a commensurate increase in demand for computing and a search for minimal encoding. Recent research advocates Hybrid Block Floating Point (HBFP) to minimize silicon provisioning in accelerators by converting the majority of arithmetic operations in training to 8-bit fixed point. In this paper, we perform a full-scale exploration of the HBFP design space using mathematical tools to study the interplay among various parameters and identify opportunities for even smaller encodings across layers and epochs. Based on our findings, we propose Accuracy Boosters, an epoch-driven mixed-mantissa HBFP technique that uses 6-bit mantissas only in the last epoch and first/last layers, and 4-bit mantissas for $99.7\%$ of all other arithmetic operations in training. Using analytic models, we show Accuracy Boosters enable increasing arithmetic density for an HBFP training accelerator by up to $21.3\times$ compared to FP32 and up to $4.4\times$ compared to another SOTA format Bfloat16, while preserving or outperforming FP32 accuracy.
    ChatGPT for Robotics: Design Principles and Model Abilities. (arXiv:2306.17582v1 [cs.AI])
    This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering techniques and dialog strategies towards the execution of various types of robotics tasks. We explore ChatGPT's ability to use free-form dialog, parse XML tags, and to synthesize code, in addition to the use of task-specific prompting functions and closed-loop reasoning through dialogues. Our study encompasses a range of tasks within the robotics domain, from basic logical, geometrical, and mathematical reasoning all the way to complex domains such as aerial navigation, manipulation, and embodied agents. We show that ChatGPT can be effective at solving several of such tasks, while allowing users to interact with it primarily via natural language instructions. In addition to these studies, we introduce an open-sourced research tool called PromptCraft, which contains a platform where researchers can collaboratively upload and vote on examples of good prompting schemes for robotics applications, as well as a sample robotics simulator with ChatGPT integration, making it easier for users to get started with using ChatGPT for robotics.
    Sequential Recommendation Model for Next Purchase Prediction. (arXiv:2207.06225v2 [cs.IR] UPDATED)
    Timeliness and contextual accuracy of recommendations are increasingly important when delivering contemporary digital marketing experiences. Conventional recommender systems (RS) suggest relevant but time-invariant items to users by accounting for their past purchases. These recommendations only map to customers' general preferences rather than a customer's specific needs immediately preceding a purchase. In contrast, RSs that consider the order of transactions, purchases, or experiences to measure evolving preferences can offer more salient and effective recommendations to customers: Sequential RSs not only benefit from a better behavioral understanding of a user's current needs but also better predictive power. In this paper, we demonstrate and rank the effectiveness of a sequential recommendation system by utilizing a production dataset of over 2.7 million credit card transactions for 46K cardholders. The method first employs an autoencoder on raw transaction data and submits observed transaction encodings to a GRU-based sequential model. The sequential model produces a MAP@1 metric of 47% on the out-of-sample test set, in line with existing research. We also discuss implications for embedding real-time predictions using the sequential RS into Nexus, a scalable, low-latency, event-based digital experience architecture.
    Improving Expert Predictions with Conformal Prediction. (arXiv:2201.12006v5 [cs.LG] UPDATED)
    Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Otherwise, the experts may be better off solving the classification tasks on their own. In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. Rather than providing (single) label predictions and letting experts decide when to trust these predictions, our system provides sets of label predictions constructed using conformal prediction$\unicode{x2014}$prediction sets$\unicode{x2014}$and forcefully asks experts to predict labels from these sets. By using conformal prediction, our system can precisely trade-off the probability that the true label is not in the prediction set, which determines how frequently our system will mislead the experts, and the size of the prediction set, which determines the difficulty of the classification task the experts need to solve using our system. In addition, we develop an efficient and near-optimal search method to find the conformal predictor under which the experts benefit the most from using our system. Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on.
    Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification. (arXiv:2304.02836v5 [eess.IV] UPDATED)
    The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.
    Robust Implicit Regularization via Weight Normalization. (arXiv:2305.05448v2 [cs.LG] UPDATED)
    Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has shown that (stochastic) gradient descent tends to have an implicit bias towards low rank and/or sparse solutions when used to train deep linear networks, explaining to some extent why overparameterized neural network models trained by gradient descent tend to have good generalization performance in practice. However, existing theory for square-loss objectives often requires very small initialization of the trainable weights, which is at odds with the larger scale at which weights are initialized in practice for faster convergence and better generalization performance. In this paper, we aim to close this gap by incorporating and analyzing gradient descent with weight normalization, where the weight vector is reparamterized in terms of polar coordinates, and gradient descent is applied to the polar coordinates. By analyzing key invariants of the gradient flow and using Lojasiewicz's Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient descent, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale. Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization in overparameterized diagonal linear network models.
    Towards Complex Dynamic Physics System Simulation with Graph Neural ODEs. (arXiv:2305.12334v4 [cs.LG] UPDATED)
    The great learning ability of deep learning models facilitates us to comprehend the real physical world, making learning to simulate complicated particle systems a promising endeavour. However, the complex laws of the physical world pose significant challenges to the learning based simulations, such as the varying spatial dependencies between interacting particles and varying temporal dependencies between particle system states in different time stamps, which dominate particles' interacting behaviour and the physical systems' evolution patterns. Existing learning based simulation methods fail to fully account for the complexities, making them unable to yield satisfactory simulations. To better comprehend the complex physical laws, this paper proposes a novel learning based simulation model- Graph Networks with Spatial-Temporal neural Ordinary Equations (GNSTODE)- that characterizes the varying spatial and temporal dependencies in particle systems using a united end-to-end framework. Through training with real-world particle-particle interaction observations, GNSTODE is able to simulate any possible particle systems with high precisions. We empirically evaluate GNSTODE's simulation performance on two real-world particle systems, Gravity and Coulomb, with varying levels of spatial and temporal dependencies. The results show that the proposed GNSTODE yields significantly better simulations than state-of-the-art learning based simulation methods, which proves that GNSTODE can serve as an effective solution to particle simulations in real-world application.
    Graphtester: Exploring Theoretical Boundaries of GNNs on Graph Datasets. (arXiv:2306.17482v1 [cs.LG])
    Graph Neural Networks (GNNs) have emerged as a powerful tool for learning from graph-structured data. However, even state-of-the-art architectures have limitations on what structures they can distinguish, imposing theoretical limits on what the networks can achieve on different datasets. In this paper, we provide a new tool called Graphtester for a comprehensive analysis of the theoretical capabilities of GNNs for various datasets, tasks, and scores. We use Graphtester to analyze over 40 different graph datasets, determining upper bounds on the performance of various GNNs based on the number of layers. Further, we show that the tool can also be used for Graph Transformers using positional node encodings, thereby expanding its scope. Finally, we demonstrate that features generated by Graphtester can be used for practical applications such as Graph Transformers, and provide a synthetic dataset to benchmark node and edge features, such as positional encodings. The package is freely available at the following URL: https://github.com/meakbiyik/graphtester.
    Look, Remember and Reason: Visual Reasoning with Grounded Rationales. (arXiv:2306.17778v1 [cs.CV])
    Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of ``Look, Remember, Reason'': visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.
    Private Online Prediction from Experts: Separations and Faster Rates. (arXiv:2210.13537v3 [cs.LG] UPDATED)
    Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde{O}(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.
    Why Deep Models Often cannot Beat Non-deep Counterparts on Molecular Property Prediction?. (arXiv:2306.17702v1 [cs.LG])
    Molecular property prediction (MPP) is a crucial task in the drug discovery pipeline, which has recently gained considerable attention thanks to advances in deep neural networks. However, recent research has revealed that deep models struggle to beat traditional non-deep ones on MPP. In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 14 molecule datasets. Through the most comprehensive study to date, we make the following key observations: \textbf{(\romannumeral 1)} Deep models are generally unable to outperform non-deep ones; \textbf{(\romannumeral 2)} The failure of deep models on MPP cannot be solely attributed to the small size of molecular datasets. What matters is the irregular molecule data pattern; \textbf{(\romannumeral 3)} In particular, tree models using molecular fingerprints as inputs tend to perform better than other competitors. Furthermore, we conduct extensive empirical investigations into the unique patterns of molecule data and inductive biases of various models underlying these phenomena.
    Enhancing training of physics-informed neural networks using domain-decomposition based preconditioning strategies. (arXiv:2306.17648v1 [math.NA])
    We propose to enhance the training of physics-informed neural networks (PINNs). To this aim, we introduce nonlinear additive and multiplicative preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear preconditioners are constructed by utilizing the Schwarz domain-decomposition framework, where the parameters of the network are decomposed in a layer-wise manner. Through a series of numerical experiments, we demonstrate that both, additive and multiplicative preconditioners significantly improve the convergence of the standard L-BFGS optimizer, while providing more accurate solutions of the underlying partial differential equations. Moreover, the additive preconditioner is inherently parallel, thus giving rise to a novel approach to model parallelism.
    Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle. (arXiv:2301.12571v2 [cs.LG] UPDATED)
    In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment.
    Impact of Noise on Calibration and Generalisation of Neural Networks. (arXiv:2306.17630v1 [cs.LG])
    Noise injection and data augmentation strategies have been effective for enhancing the generalisation and robustness of neural networks (NNs). Certain types of noise such as label smoothing and MixUp have also been shown to improve calibration. Since noise can be added in various stages of the NN's training, it motivates the question of when and where the noise is the most effective. We study a variety of noise types to determine how much they improve calibration and generalisation, and under what conditions. More specifically we evaluate various noise-injection strategies in both in-distribution (ID) and out-of-distribution (OOD) scenarios. The findings highlight that activation noise was the most transferable and effective in improving generalisation, while input augmentation noise was prominent in improving calibration on OOD but not necessarily ID data.
    Achieving RGB-D level Segmentation Performance from a Single ToF Camera. (arXiv:2306.17636v1 [cs.CV])
    Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.
    Stay on topic with Classifier-Free Guidance. (arXiv:2306.17806v1 [cs.CL])
    Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.
    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient. (arXiv:1806.04047v4 [stat.ML] UPDATED)
    We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{individual parameters}. Any convex function, e.g., norm, can characterize the structure of both common and individual parameters. We delineate the sample complexity of our estimator and provide a high probability non-asymptotic bound for estimation error of all parameters under a geometric condition. We show that the recovery of the common parameter benefits from \emph{all} of the pooled samples. We propose an iterative estimation algorithm with a geometric convergence rate and supplement our theoretical analysis with experiments on synthetic data. Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.
    Physics-informed invertible neural network for the Koopman operator learning. (arXiv:2306.17396v1 [math.NA])
    In Koopman operator theory, a finite-dimensional nonlinear system is transformed into an infinite but linear system using a set of observable functions. However, manually selecting observable functions that span the invariant subspace of the Koopman operator based on prior knowledge is inefficient and challenging, particularly when little or no information is available about the underlying systems. Furthermore, current methodologies tend to disregard the importance of the invertibility of observable functions, which leads to inaccurate results. To address these challenges, we propose the so-called FlowDMD, a Flow-based Dynamic Mode Decomposition that utilizes the Coupling Flow Invertible Neural Network (CF-INN) framework. FlowDMD leverages the intrinsically invertible characteristics of the CF-INN to learn the invariant subspaces of the Koopman operator and accurately reconstruct state variables. Numerical experiments demonstrate the superior performance of our algorithm compared to state-of-the-art methodologies.
    The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks. (arXiv:2306.17499v1 [cs.LG])
    We study the type of solutions to which stochastic gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. Our results are based on a dynamical stability analysis. In the univariate case, it was shown that linearly stable minima correspond to network functions (predictors), whose second derivative has a bounded weighted $L^1$ norm. Notably, the bound gets smaller as the step size increases, implying that training with a large step size leads to `smoother' predictors. Here we generalize this result to the multivariate case, showing that a similar result applies to the Laplacian of the predictor. We demonstrate the tightness of our bound on the MNIST dataset, and show that it accurately captures the behavior of the solutions as a function of the step size. Additionally, we prove a depth separation result on the approximation power of ReLU networks corresponding to stable minima of the loss. Specifically, although shallow ReLU networks are universal approximators, we prove that stable shallow networks are not. Namely, there is a function that cannot be well-approximated by stable single hidden-layer ReLU networks trained with a non-vanishing step size. This is while the same function can be realized as a stable two hidden-layer ReLU network. Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.
    Beyond Neural-on-Neural Approaches to Speaker Gender Protection. (arXiv:2306.17700v1 [eess.AS])
    Recent research has proposed approaches that modify speech to defend against gender inference attacks. The goal of these protection algorithms is to control the availability of information about a speaker's gender, a privacy-sensitive attribute. Currently, the common practice for developing and testing gender protection algorithms is "neural-on-neural", i.e., perturbations are generated and tested with a neural network. In this paper, we propose to go beyond this practice to strengthen the study of gender protection. First, we demonstrate the importance of testing gender inference attacks that are based on speech features historically developed by speech scientists, alongside the conventionally used neural classifiers. Next, we argue that researchers should use speech features to gain insight into how protective modifications change the speech signal. Finally, we point out that gender-protection algorithms should be compared with novel "vocal adversaries", human-executed voice adaptations, in order to improve interpretability and enable before-the-mic protection.
    Scaling Model Checking for DNN Analysis via State-Space Reduction and Input Segmentation (Extended Version). (arXiv:2306.17323v1 [cs.LG])
    Owing to their remarkable learning capabilities and performance in real-world applications, the use of machine learning systems based on Neural Networks (NNs) has been continuously increasing. However, various case studies and empirical findings in the literature suggest that slight variations to NN inputs can lead to erroneous and undesirable NN behavior. This has led to considerable interest in their formal analysis, aiming to provide guarantees regarding a given NN's behavior. Existing frameworks provide robustness and/or safety guarantees for the trained NNs, using satisfiability solving and linear programming. We proposed FANNet, the first model checking-based framework for analyzing a broader range of NN properties. However, the state-space explosion associated with model checking entails a scalability problem, making the FANNet applicable only to small NNs. This work develops state-space reduction and input segmentation approaches, to improve the scalability and timing efficiency of formal NN analysis. Compared to the state-of-the-art FANNet, this enables our new model checking-based framework to reduce the verification's timing overhead by a factor of up to 8000, making the framework applicable to NNs even with approximately $80$ times more network parameters. This in turn allows the analysis of NN safety properties using the new framework, in addition to all the NN properties already included with FANNet. The framework is shown to be efficiently able to analyze properties of NNs trained on healthcare datasets as well as the well--acknowledged ACAS Xu NNs.
    Class-Incremental Learning using Diffusion Model for Distillation and Replay. (arXiv:2306.17560v1 [cs.LG])
    Class-incremental learning aims to learn new classes in an incremental fashion without forgetting the previously learned ones. Several research works have shown how additional data can be used by incremental models to help mitigate catastrophic forgetting. In this work, following the recent breakthrough in text-to-image generative models and their wide distribution, we propose the use of a pretrained Stable Diffusion model as a source of additional data for class-incremental learning. Compared to competitive methods that rely on external, often unlabeled, datasets of real images, our approach can generate synthetic samples belonging to the same classes as the previously encountered images. This allows us to use those additional data samples not only in the distillation loss but also for replay in the classification loss. Experiments on the competitive benchmarks CIFAR100, ImageNet-Subset, and ImageNet demonstrate how this new approach can be used to further improve the performance of state-of-the-art methods for class-incremental learning on large scale datasets.
    Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters. (arXiv:2306.17575v1 [cs.CL])
    University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e.g., race, gender), grades, essays, and recommendation letters are considered, to compose an excellent and diverse class. In this study, we empirically evaluate how influential protected attributes are for predicting admission decisions using a machine learning (ML) model, and in how far textual information (e.g., personal essay, teacher recommendation) may substitute for the loss of protected attributes in the model. Using data from 14,915 applicants to an undergraduate admission office at a selective U.S. institution in the 2022-2023 cycle, we find that the exclusion of protected attributes from the ML model leads to substantially reduced admission-prediction performance. The inclusion of textual information via both a TF-IDF representation and a Latent Dirichlet allocation (LDA) model partially restores model performance, but does not appear to provide a full substitute for admitting a similarly diverse class. In particular, while the text helps with gender diversity, the proportion of URM applicants is severely impacted by the exclusion of protected attributes, and the inclusion of new attributes generated from the textual information does not recover this performance loss.
    Lifelong Learning for Neural powered Mixed Integer Programming. (arXiv:2208.12226v3 [math.OC] UPDATED)
    Mixed Integer programs (MIPs) are typically solved by the Branch-and-Bound algorithm. Recently, Learning to imitate fast approximations of the expert strong branching heuristic has gained attention due to its success in reducing the running time for solving MIPs. However, existing learning-to-branch methods assume that the entire training data is available in a single session of training. This assumption is often not true, and if the training data is supplied in continual fashion over time, existing techniques suffer from catastrophic forgetting. In this work, we study the hitherto unexplored paradigm of Lifelong Learning to Branch on Mixed Integer Programs. To mitigate catastrophic forgetting, we propose LIMIP, which is powered by the idea of modeling an MIP instance in the form of a bipartite graph, which we map to an embedding space using a bipartite Graph Attention Network. This rich embedding space avoids catastrophic forgetting through the application of knowledge distillation and elastic weight consolidation, wherein we learn the parameters key towards retaining efficacy and are therefore protected from significant drift. We evaluate LIMIP on a series of NP-hard problems and establish that in comparison to existing baselines, LIMIP is up to 50% better when confronted with lifelong learning.
    Hierarchical Bayesian Regression for Multi-Location Sales Transaction Forecasting. (arXiv:2306.17795v1 [stat.AP])
    The features in many prediction models naturally take the form of a hierarchy. The lower levels represent individuals or events. These units group naturally into locations and intervals or other aggregates, often at multiple levels. Levels of groupings may intersect and join, much as relational database tables do. Besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels. Such models lend themselves to hierarchical Bayes solution methods that ``share'' results of inference between groups by generalizing over the case of individual models for each group versus one model that aggregates all groups into one. In this paper we show our work-in-progress applying a hierarchical Bayesian model to forecast purchases throughout the day at store franchises, with groupings over locations and days of the week. We demonstrate using the \textsf{stan} package on individual sales transaction data collected over the course of a year. We show how this solves the dilemma of having limited data and hence modest accuracy for each day and location, while being able to scale to a large number of locations with improved accuracy.
    Why Shallow Networks Struggle with Approximating and Learning High Frequency: A Numerical Study. (arXiv:2306.17301v1 [cs.LG])
    In this work, a comprehensive numerical study involving analysis and experiments shows why a two-layer neural network has difficulties handling high frequencies in approximation and learning when machine precision and computation cost are important factors in real practice. In particular, the following fundamental computational issues are investigated: (1) the best accuracy one can achieve given a finite machine precision, (2) the computation cost to achieve a given accuracy, and (3) stability with respect to perturbations. The key to the study is the spectral analysis of the corresponding Gram matrix of the activation functions which also shows how the properties of the activation function play a role in the picture.
    The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks. (arXiv:2306.17844v1 [cs.LG])
    Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.
    Efficient uniform approximation using Random Vector Functional Link networks. (arXiv:2306.17501v1 [stat.ML])
    A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
    A Survey on Blockchain-Based Federated Learning and Data Privacy. (arXiv:2306.17338v1 [cs.LG])
    Federated learning is a decentralized machine learning paradigm that allows multiple clients to collaborate by leveraging local computational power and the models transmission. This method reduces the costs and privacy concerns associated with centralized machine learning methods while ensuring data privacy by distributing training data across heterogeneous devices. On the other hand, federated learning has the drawback of data leakage due to the lack of privacy-preserving mechanisms employed during storage, transfer, and sharing, thus posing significant risks to data owners and suppliers. Blockchain technology has emerged as a promising technology for offering secure data-sharing platforms in federated learning, especially in Industrial Internet of Things (IIoT) settings. This survey aims to compare the performance and security of various data privacy mechanisms adopted in blockchain-based federated learning architectures. We conduct a systematic review of existing literature on secure data-sharing platforms for federated learning provided by blockchain technology, providing an in-depth overview of blockchain-based federated learning, its essential components, and discussing its principles, and potential applications. The primary contribution of this survey paper is to identify critical research questions and propose potential directions for future research in blockchain-based federated learning.
    Variational principle to regularize machine-learned density functionals: the non-interacting kinetic-energy functional. (arXiv:2306.17587v1 [physics.chem-ph])
    Practical density functional theory (DFT) owes its success to the groundbreaking work of Kohn and Sham that introduced the exact calculation of the non-interacting kinetic energy of the electrons using an auxiliary mean-field system. However, the full power of DFT will not be unleashed until the exact relationship between the electron density and the non-interacting kinetic energy is found. Various attempts have been made to approximate this functional, similar to the exchange--correlation functional, with much less success due to the larger contribution of kinetic energy and its more non-local nature. In this work we propose a new and efficient regularization method to train density functionals based on deep neural networks, with particular interest in the kinetic-energy functional. The method is tested on (effectively) one-dimensional systems, including the hydrogen chain, non-interacting electrons, and atoms of the first two periods, with excellent results. For the atomic systems, the generalizability of the regularization method is demonstrated by training also an exchange--correlation functional, and the contrasting nature of the two functionals is discussed from a machine-learning perspective.
    Bayesian Optimization with Formal Safety Guarantees via Online Conformal Prediction. (arXiv:2306.17815v1 [cs.LG])
    Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.
    Designing Stable Neural Networks using Convex Analysis and ODEs. (arXiv:2306.17332v1 [cs.LG])
    Motivated by classical work on the numerical integration of ordinary differential equations we present a ResNet-styled neural network architecture that encodes non-expansive (1-Lipschitz) operators, as long as the spectral norms of the weights are appropriately constrained. This is to be contrasted with the ordinary ResNet architecture which, even if the spectral norms of the weights are constrained, has a Lipschitz constant that, in the worst case, grows exponentially with the depth of the network. Further analysis of the proposed architecture shows that the spectral norms of the weights can be further constrained to ensure that the network is an averaged operator, making it a natural candidate for a learned denoiser in Plug-and-Play algorithms. Using a novel adaptive way of enforcing the spectral norm constraints, we show that, even with these constraints, it is possible to train performant networks. The proposed architecture is applied to the problem of adversarially robust image classification, to image denoising, and finally to the inverse problem of deblurring.
    Navigation of micro-robot swarms for targeted delivery using reinforcement learning. (arXiv:2306.17598v1 [cs.RO])
    Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optimization (RPO) to navigate a swarm of 4, 9 and 16 microswimmers under hydrodynamic effects, controlled by their orientation, towards a circular absorbing target. We look at both PPO and RPO performances with limited state information scenarios and also test their robustness for random target location and size. We use curriculum learning to improve upon the performance and demonstrate the same in learning to navigate a swarm of 25 swimmers and steering the swarm to exemplify the manoeuvring capabilities of the RL model.
    Thompson sampling for improved exploration in GFlowNets. (arXiv:2306.17693v1 [cs.LG])
    Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
    Global Optimality in Bivariate Gradient-based DAG Learning. (arXiv:2306.17378v1 [cs.LG])
    Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.
    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit. (arXiv:2306.17759v1 [stat.ML])
    In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.
    Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories. (arXiv:2209.09977v2 [math.DS] UPDATED)
    Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, we may not gain access to a sufficiently rich collection of such functions to describe the dynamics. This is because the commonly used method of forming time-delayed observables fails when there is actuation. To remedy these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.
    Unsupervised Text Embedding Space Generation Using Generative Adversarial Networks for Text Synthesis. (arXiv:2306.17181v1 [cs.CL])
    Generative Adversarial Networks (GAN) is a model for data synthesis, which creates plausible data through the competition of generator and discriminator. Although GAN application to image synthesis is extensively studied, it has inherent limitations to natural language generation. Because natural language is composed of discrete tokens, a generator has difficulty updating its gradient through backpropagation; therefore, most text-GAN studies generate sentences starting with a random token based on a reward system. Thus, the generators of previous studies are pre-trained in an autoregressive way before adversarial training, causing data memorization that synthesized sentences reproduce the training data. In this paper, we synthesize sentences using a framework similar to the original GAN. More specifically, we propose Text Embedding Space Generative Adversarial Networks (TESGAN) which generate continuous text embedding spaces instead of discrete tokens to solve the gradient backpropagation problem. Furthermore, TESGAN conducts unsupervised learning which does not directly refer to the text of the training data to overcome the data memorization issue. By adopting this novel method, TESGAN can synthesize new sentences, showing the potential of unsupervised learning for text synthesis. We expect to see extended research combining Large Language Models with a new perspective of viewing text as an continuous space.
    ReLU Neural Networks, Polyhedral Decompositions, and Persistent Homolog. (arXiv:2306.17418v1 [math.AT])
    A ReLU neural network leads to a finite polyhedral decomposition of input space and a corresponding finite dual graph. We show that while this dual graph is a coarse quantization of input space, it is sufficiently robust that it can be combined with persistent homology to detect homological signals of manifolds in the input space from samples. This property holds for a variety of networks trained for a wide range of purposes that have nothing to do with this topological application. We found this feature to be surprising and interesting; we hope it will also be useful.
    iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models. (arXiv:2306.17361v1 [cs.LG])
    Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.
    $\lambda$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces. (arXiv:2306.17366v1 [cs.LG])
    The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into prominent algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works are vital to achieving good performance for related algorithms, and we showcase differences in behavior between different instantiations of value-aware algorithms in stochastic environments. Using these insights, we propose the Latent Model-Based Decision-Aware Actor-Critic framework ($\lambda$-AC) for decision-aware model-based reinforcement learning in continuous state-spaces and highlight important design choices in different environments.
    Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers. (arXiv:2306.17486v1 [cs.LG])
    We present a deep learning-based iterative approach to solve the discrete heterogeneous Helmholtz equation for high wavenumbers. Combining classical iterative multigrid solvers and convolutional neural networks (CNNs) via preconditioning, we obtain a learned neural solver that is faster and scales better than a standard multigrid solver. Our approach offers three main contributions over previous neural methods of this kind. First, we construct a multilevel U-Net-like encoder-solver CNN with an implicit layer on the coarsest grid of the U-Net, where convolution kernels are inverted. This alleviates the field of view problem in CNNs and allows better scalability. Second, we improve upon the previous CNN preconditioner in terms of the number of parameters, computation time, and convergence rates. Third, we propose a multiscale training approach that enables the network to scale to problems of previously unseen dimensions while still maintaining a reasonable training procedure. Our encoder-solver architecture can be used to generalize over different slowness models of various difficulties and is efficient at solving for many right-hand sides per slowness model. We demonstrate the benefits of our novel architecture with numerical experiments on a variety of heterogeneous two-dimensional problems at high wavenumbers.
    Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. (arXiv:2306.17563v1 [cs.IR])
    Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, there has been limited success so far, as researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully understand these ranking formulations, possibly due to the nature of how LLMs are trained. In this paper, we propose to significantly reduce the burden on LLMs by using a new technique called Pairwise Ranking Prompting (PRP). Our results are the first in the literature to achieve state-of-the-art ranking performance on standard benchmarks using moderate-sized open-sourced LLMs. On TREC-DL2020, PRP based on the Flan-UL2 model with 20B parameters outperforms the previous best approach in the literature, which is based on the blackbox commercial GPT-4 that has 50x (estimated) model size, by over 5% at NDCG@1. On TREC-DL2019, PRP is only inferior to the GPT-4 solution on the NDCG@5 and NDCG@10 metrics, while outperforming other existing solutions, such as InstructGPT which has 175B parameters, by over 10% for nearly all ranking metrics. Furthermore, we propose several variants of PRP to improve efficiency and show that it is possible to achieve competitive results even with linear complexity. We also discuss other benefits of PRP, such as supporting both generation and scoring LLM APIs, as well as being insensitive to input ordering.
    Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation. (arXiv:2306.17817v1 [cs.RO])
    3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
    Optimal Execution Using Reinforcement Learning. (arXiv:2306.17178v1 [q-fin.TR])
    This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact of cross-exchange signals on the agent's decision-making in the optimal execution problem. Experimental results show that cross-exchange signals can provide additional information for the optimal execution of cryptocurrency to facilitate the optimal execution process.
    Subgraph Stationary Hardware-Software Inference Co-Design. (arXiv:2306.17266v1 [cs.DC])
    A growing number of applications depend on Machine Learning (ML) functionality and benefits from both higher quality ML predictions and better timeliness (latency) at the same time. A growing body of research in computer architecture, ML, and systems software literature focuses on reaching better latency-accuracy tradeoffs for ML models. Efforts include compression, quantization, pruning, early-exit models, mixed DNN precision, as well as ML inference accelerator designs that minimize latency and energy, while preserving delivered accuracy. All of them, however, yield improvements for a single static point in the latency-accuracy tradeoff space. We make a case for applications that operate in dynamically changing deployment scenarios, where no single static point is optimal. We draw on a recently proposed weight-shared SuperNet mechanism to enable serving a stream of queries that uses (activates) different SubNets within this weight-shared construct. This creates an opportunity to exploit the inherent temporal locality with our proposed SubGraph Stationary (SGS) optimization. We take a hardware-software co-design approach with a real implementation of SGS in SushiAccel and the implementation of a software scheduler SushiSched controlling which SubNets to serve and what to cache in real-time. Combined, they are vertically integrated into SUSHI-an inference serving stack. For the stream of queries, SUSHI yields up to 25% improvement in latency, 0.98% increase in served accuracy. SUSHI can achieve up to 78.7% off-chip energy savings.
    Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition. (arXiv:2306.17500v1 [cs.SD])
    Speech emotion recognition (SER) is vital for obtaining emotional intelligence and understanding the contextual meaning of speech. Variations of consonant-vowel (CV) phonemic boundaries can enrich acoustic context with linguistic cues, which impacts SER. In practice, speech emotions are treated as single labels over an acoustic segment for a given time duration. However, phone boundaries within speech are not discrete events, therefore the perceived emotion state should also be distributed over potentially continuous time-windows. This research explores the implication of acoustic context and phone boundaries on local markers for SER using an attention-based approach. The benefits of using a distributed approach to speech emotion understanding are supported by the results of cross-corpora analysis experiments. Experiments where phones and words are mapped to the attention vectors along with the fundamental frequency to observe the overlapping distributions and thereby the relationship between acoustic context and emotion. This work aims to bridge psycholinguistic theory research with computational modelling for SER.
    Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization. (arXiv:2306.17529v1 [cs.RO])
    Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle's motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from $0.25m$ to $5m$, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately $35\%$ of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.
    Kernel $\epsilon$-Greedy for Contextual Bandits. (arXiv:2306.17329v1 [stat.ML])
    We consider a kernelized version of the $\epsilon$-greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing kernel Hilbert space (RKHS). We propose an online weighted kernel ridge regression estimator for the reward functions. Under some conditions on the exploration probability sequence, $\{\epsilon_t\}_t$, and choice of the regularization parameter, $\{\lambda_t\}_t$, we show that the proposed estimator is consistent. We also show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin condition for finite-dimensional RKHS.
    Provable Robust Watermarking for AI-Generated Text. (arXiv:2306.17439v1 [cs.CL])
    As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on generation quality, correctness in detection, and security against evasion attacks. Experimental results on various large language models (LLMs) and diverse datasets demonstrate that our method achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs.
    Guided Deep Generative Model-based Spatial Regularization for Multiband Imaging Inverse Problems. (arXiv:2306.17197v1 [eess.IV])
    When adopting a model-based formulation, solving inverse problems encountered in multiband imaging requires to define spatial and spectral regularizations. In most of the works of the literature, spectral information is extracted from the observations directly to derive data-driven spectral priors. Conversely, the choice of the spatial regularization often boils down to the use of conventional penalizations (e.g., total variation) promoting expected features of the reconstructed image (e.g., piecewise constant). In this work, we propose a generic framework able to capitalize on an auxiliary acquisition of high spatial resolution to derive tailored data-driven spatial regularizations. This approach leverages on the ability of deep learning to extract high level features. More precisely, the regularization is conceived as a deep generative network able to encode spatial semantic features contained in this auxiliary image of high spatial resolution. To illustrate the versatility of this approach, it is instantiated to conduct two particular tasks, namely multiband image fusion and multiband image inpainting. Experimental results obtained on these two tasks demonstrate the benefit of this class of informed regularizations when compared to more conventional ones.
    Enterprise Disk Drive Scrubbing Based on Mondrian Conformal Predictors. (arXiv:2306.17169v1 [cs.DC])
    Disk scrubbing is a process aimed at resolving read errors on disks by reading data from the disk. However, scrubbing the entire storage array at once can adversely impact system performance, particularly during periods of high input/output operations. Additionally, the continuous reading of data from disks when scrubbing can result in wear and tear, especially on larger capacity disks, due to the significant time and energy consumption involved. To address these issues, we propose a selective disk scrubbing method that enhances the overall reliability and power efficiency in data centers. Our method employs a Machine Learning model based on Mondrian Conformal prediction to identify specific disks for scrubbing, by proactively predicting the health status of each disk in the storage pool, forecasting n-days in advance, and using an open-source dataset. For disks predicted as non-healthy, we mark them for replacement without further action. For healthy drives, we create a set and quantify their relative health across the entire storage pool based on the predictor's confidence. This enables us to prioritize selective scrubbing for drives with established scrubbing frequency based on the scrub cycle. The method we propose provides an efficient and dependable solution for managing enterprise disk drives. By scrubbing just 22.7% of the total storage disks, we can achieve optimized energy consumption and reduce the carbon footprint of the data center.  ( 3 min )
    DisasterResponseGPT: Large Language Models for Accelerated Plan of Action Development in Disaster Response Scenarios. (arXiv:2306.17271v1 [cs.LG])
    The development of plans of action in disaster response scenarios is a time-consuming process. Large Language Models (LLMs) offer a powerful solution to expedite this process through in-context learning. This study presents DisasterResponseGPT, an algorithm that leverages LLMs to generate valid plans of action quickly by incorporating disaster response and planning guidelines in the initial prompt. In DisasterResponseGPT, users input the scenario description and receive a plan of action as output. The proposed method generates multiple plans within seconds, which can be further refined following the user's feedback. Preliminary results indicate that the plans of action developed by DisasterResponseGPT are comparable to human-generated ones while offering greater ease of modification in real-time. This approach has the potential to revolutionize disaster response operations by enabling rapid updates and adjustments during the plan's execution.
    FedBone: Towards Large-Scale Federated Multi-Task Learning. (arXiv:2306.17465v1 [cs.LG])
    Heterogeneous federated multi-task learning (HFMTL) is a federated learning technique that combines heterogeneous tasks of different clients to achieve more accurate, comprehensive predictions. In real-world applications, visual and natural language tasks typically require large-scale models to extract high-level abstract features. However, large-scale models cannot be directly applied to existing federated multi-task learning methods. Existing HFML methods also disregard the impact of gradient conflicts on multi-task optimization during the federated aggregation process. In this work, we propose an innovative framework called FedBone, which enables the construction of large-scale models with better generalization from the perspective of server-client split learning and gradient projection. We split the entire model into two components: a large-scale general model (referred to as the general model) on the cloud server and multiple task-specific models (referred to as the client model) on edge clients, solving the problem of insufficient computing power on edge clients. The conflicting gradient projection technique is used to enhance the generalization of the large-scale general model between different tasks. The proposed framework is evaluated on two benchmark datasets and a real ophthalmic dataset. Comprehensive results demonstrate that FedBone efficiently adapts to heterogeneous local tasks of each client and outperforms existing federated learning algorithms in most dense prediction and classification tasks with off-the-shelf computational resources on the client side.
    Machine learning for sports betting: should predictive models be optimised for accuracy or calibration?. (arXiv:2303.06021v3 [cs.LG] UPDATED)
    Sports betting's recent federal legalisation in the USA coincides with the golden age of machine learning. If bettors can leverage data to reliably predict the probability of an outcome, they can recognise when the bookmaker's odds are in their favour. As sports betting is a multi-billion dollar industry in the USA alone, identifying such opportunities could be extremely lucrative. Many researchers have applied machine learning to the sports outcome prediction problem, generally using accuracy to evaluate the performance of predictive models. We hypothesise that for the sports betting problem, model calibration is more important than accuracy. To test this hypothesis, we train models on NBA data over several seasons and run betting experiments on a single season, using published odds. We show that optimising the predictive model for calibration leads to greater returns than optimising for accuracy, on average (return on investment of $+34.69\%$ versus $-35.17\%$) and in the best case ($+36.93\%$ versus $+5.56\%$). These findings suggest that for sports betting (or any probabilistic decision-making problem), calibration is a more important metric than accuracy. Sports bettors who wish to increase profits should therefore optimise their predictive model for calibration.
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v3 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
    Integrating Tick-level Data and Periodical Signal for High-frequency Market Making. (arXiv:2306.17179v1 [q-fin.TR])
    We focus on the problem of market making in high-frequency trading. Market making is a critical function in financial markets that involves providing liquidity by buying and selling assets. However, the increasing complexity of financial markets and the high volume of data generated by tick-level trading makes it challenging to develop effective market making strategies. To address this challenge, we propose a deep reinforcement learning approach that fuses tick-level data with periodic prediction signals to develop a more accurate and robust market making strategy. Our results of market making strategies based on different deep reinforcement learning algorithms under the simulation scenarios and real data experiments in the cryptocurrency markets show that the proposed framework outperforms existing methods in terms of profitability and risk management.  ( 2 min )
    Steganographic Capacity of Deep Learning Models. (arXiv:2306.17189v1 [cs.CR])
    As machine learning and deep learning models become ubiquitous, it is inevitable that there will be attempts to exploit such models in various attack scenarios. For example, in a steganographic-based attack, information could be hidden in a learning model, which might then be used to distribute malware, or for other malicious purposes. In this research, we consider the steganographic capacity of several learning models. Specifically, we train a Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Transformer model on a challenging malware classification problem. For each of the resulting models, we determine the number of low-order bits of the trained parameters that can be altered without significantly affecting the performance of the model. We find that the steganographic capacity of the learning models tested is surprisingly high, and that in each case, there is a clear threshold after which model performance rapidly degrades.  ( 2 min )
    Federated Ensemble YOLOv5 - A Better Generalized Object Detection Algorithm. (arXiv:2306.17829v1 [cs.CV])
    Federated learning (FL) has gained significant traction as a privacy-preserving algorithm, but the underlying resembles of federated learning algorithm like Federated averaging (FED Avg) or Federated SGD (FED SGD) to ensemble learning algorithms has not been fully explored. The purpose of this paper is to examine the application of FL to object detection as a method to enhance generalizability, and to compare its performance against a centralized training approach for an object detection algorithm. Specifically, we investigate the performance of a YOLOv5 model trained using FL across multiple clients and employ a random sampling strategy without replacement, so each client holds a portion of the same dataset used for centralized training. Our experimental results showcase the superior efficiency of the FL object detector's global model in generating accurate bounding boxes for unseen objects, with the test set being a mixture of objects from two distinct clients not represented in the training dataset. These findings suggest that FL can be viewed from an ensemble algorithm perspective, akin to a synergistic blend of Bagging and Boosting techniques. As a result, FL can be seen not only as a method to enhance privacy, but also as a method to enhance the performance of a machine learning model.
    Learning Environment Models with Continuous Stochastic Dynamics. (arXiv:2306.17204v1 [cs.LG])
    Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. However, for most control problems, automata learning is not scalable enough to learn a useful model. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. The core of the scalability of our method lies in the computation of an abstract state-space representation, by applying dimensionality reduction and clustering on the observed environmental state space. The stochastic transitions are learned via passive automata learning from observed interactions of the agent and the environment. In an iterative model-based RL process, we sample additional trajectories to learn an accurate environment model in the form of a discrete-state Markov decision process (MDP). We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot. Our results show that the learned models are so precise that they enable the computation of policies solving the respective control tasks. Yet the models are more concise and more general than neural-network-based policies and by using MDPs we benefit from a wealth of tools available for analyzing them. When solving the task of LunarLander, the learned model even achieved similar or higher rewards than deep RL policies learned with stable-baselines3.  ( 3 min )
    Fast and Robust State Estimation and Tracking via Hierarchical Learning. (arXiv:2306.17267v1 [cs.LG])
    Fully distributed estimation and tracking solutions to large-scale multi-agent networks suffer slow convergence and are vulnerable to network failures. In this paper, we aim to speed up the convergence and enhance the resilience of state estimation and tracking using a simple hierarchical system architecture wherein agents are clusters into smaller networks, and a parameter server exists to aid the information exchanges among networks. The information exchange among networks is expensive and occurs only once in a while. We propose two consensus + innovation algorithms for the state estimation and tracking problems, respectively. In both algorithms, we use a novel hierarchical push-sum consensus component. For the state estimation, we use dual averaging as the local innovation component. State tracking is much harder to tackle in the presence of dropping-link failures and the standard integration of the consensus and innovation approaches are no longer applicable. Moreover, dual averaging is no longer feasible. Our algorithm introduces a pair of additional variables per link and ensure the relevant local variables evolve according to the state dynamics, and use projected local gradient descent as the local innovation component. We also characterize the convergence rates of both of the algorithms under linear local observation model and minimal technical assumptions. We numerically validate our algorithm through simulation of both state estimation and tracking problems.  ( 2 min )
    TTSWING: a Dataset for Table Tennis Swing Analysis. (arXiv:2306.17550v1 [cs.LG])
    We introduce TTSWING, a novel dataset designed for table tennis swing analysis. This dataset comprises comprehensive swing information obtained through 9-axis sensors integrated into custom-made racket grips, accompanied by anonymized demographic data of the players. We detail the data collection and annotation procedures. Furthermore, we conduct pilot studies utilizing diverse machine learning models for swing analysis. TTSWING holds tremendous potential to facilitate innovative research in table tennis analysis and is a valuable resource for the scientific community. We release the dataset and experimental codes at https://github.com/DEPhantom/TTSWING.
    Towards Zero-Shot Scale-Aware Monocular Depth Estimation. (arXiv:2306.17253v1 [cs.CV])
    Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD, nuScenes) and indoor (NYUv2) benchmarks, and achieved a new state-of-the-art in both settings using the same pre-trained model, outperforming methods that train on in-domain data and require test-time scaling to produce metric estimates.  ( 2 min )
    Residual Feature Pyramid Network for Enhancement of Vascular Patterns. (arXiv:2306.17200v1 [eess.IV])
    The accuracy of finger vein recognition systems gets degraded due to low and uneven contrast between veins and surroundings, often resulting in poor detection of vein patterns. We propose a finger-vein enhancement technique, ResFPN (Residual Feature Pyramid Network), as a generic preprocessing method agnostic to the recognition pipeline. A bottom-up pyramidal architecture using the novel Structure Detection block (SDBlock) facilitates extraction of veins of varied widths. Using a feature aggregation module (FAM), we combine these vein-structures, and train the proposed ResFPN for detection of veins across scales. With enhanced presentations, our experiments indicate a reduction upto 5% in the average recognition errors for commonly used recognition pipeline over two publicly available datasets. These improvements are persistent even in cross-dataset scenario where the dataset used to train the ResFPN is different from the one used for recognition.  ( 2 min )
    Scattering Spectra Models for Physics. (arXiv:2306.17210v1 [physics.data-an])
    Physicists routinely need probabilistic models for a number of tasks such as parameter inference or the generation of new realizations of a field. Establishing such models for highly non-Gaussian fields is a challenge, especially when the number of samples is limited. In this paper, we introduce scattering spectra models for stationary fields and we show that they provide accurate and robust statistical descriptions of a wide range of fields encountered in physics. These models are based on covariances of scattering coefficients, i.e. wavelet decomposition of a field coupled with a point-wise modulus. After introducing useful dimension reductions taking advantage of the regularity of a field under rotation and scaling, we validate these models on various multi-scale physical fields and demonstrate that they reproduce standard statistics, including spatial moments up to 4th order. These scattering spectra provide us with a low-dimensional structured representation that captures key properties encountered in a wide range of physical fields. These generic models can be used for data exploration, classification, parameter inference, symmetry detection, and component separation.  ( 2 min )
    An end-to-end framework for gene expression classification by integrating a background knowledge graph: application to cancer prognosis prediction. (arXiv:2306.17202v1 [q-bio.QM])
    Biological data may be separated into primary data, such as gene expression, and secondary data, such as pathways and protein-protein interactions. Methods using secondary data to enhance the analysis of primary data are promising, because secondary data have background information that is not included in primary data. In this study, we proposed an end-to-end framework to integrally handle secondary data to construct a classification model for primary data. We applied this framework to cancer prognosis prediction using gene expression data and a biological network. Cross-validation results indicated that our model achieved higher accuracy compared with a deep neural network model without background biological network information. Experiments conducted in patient groups by cancer type showed improvement in ROC-area under the curve for many groups. Visualizations of high accuracy cancer types identified contributing genes and pathways by enrichment analysis. Known biomarkers and novel biomarker candidates were identified through these experiments.  ( 2 min )
    Landmark Guided Active Exploration with Stable Low-level Policy Learning. (arXiv:2306.17484v1 [cs.LG])
    Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. Moreover, the dynamic variability of the low-level policy introduces non-stationarity to the high-level state transition function, significantly impeding the learning of the high-level policy. In this paper, we design a measure of prospect for subgoals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. To address the non-stationarity arising from the dynamic changes of the low-level policy, we apply a state-specific regularization to the learning of low-level policy, which facilitates stable learning of the hierarchical policy. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.
    Scalable tensor methods for nonuniform hypergraphs. (arXiv:2306.17825v1 [math.NA])
    While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.
    Why can neural language models solve next-word prediction? A mathematical perspective. (arXiv:2306.17184v1 [cs.CL])
    Recently, deep learning has revolutionized the field of natural language processing, with neural language models proving to be very effective for next-word prediction. However, a rigorous theoretical explanation for their success in the context of formal language theory has not yet been developed, as it is unclear why neural language models can learn the combinatorial rules that govern the next-word prediction task. In this paper, we study a class of formal languages that can be used to model real-world examples of English sentences. We construct neural language models can solve the next-word prediction task in this context with zero error. Our proof highlights the different roles of the embedding layer and the fully connected component within the neural language model.  ( 2 min )
    Resetting the Optimizer in Deep RL: An Empirical Study. (arXiv:2306.17833v1 [cs.LG])
    We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.
    Federated Object Detection for Quality Inspection in Shared Production. (arXiv:2306.17645v1 [cs.LG])
    Federated learning (FL) has emerged as a promising approach for training machine learning models on decentralized data without compromising data privacy. In this paper, we propose a FL algorithm for object detection in quality inspection tasks using YOLOv5 as the object detection algorithm and Federated Averaging (FedAvg) as the FL algorithm. We apply this approach to a manufacturing use-case where multiple factories/clients contribute data for training a global object detection model while preserving data privacy on a non-IID dataset. Our experiments demonstrate that our FL approach achieves better generalization performance on the overall clients' test dataset and generates improved bounding boxes around the objects compared to models trained using local clients' datasets. This work showcases the potential of FL for quality inspection tasks in the manufacturing industry and provides valuable insights into the performance and feasibility of utilizing YOLOv5 and FedAvg for federated object detection.
    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models. (arXiv:2306.17775v1 [stat.ML])
    Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
    Improving Federated Aggregation with Deep Unfolding Networks. (arXiv:2306.17362v1 [cs.LG])
    The performance of Federated learning (FL) is negatively affected by device differences and statistical characteristics between participating clients. To address this issue, we introduce a deep unfolding network (DUN)-based technique that learns adaptive weights that unbiasedly ameliorate the adverse impacts of heterogeneity. The proposed method demonstrates impressive accuracy and quality-aware aggregation. Furthermore, it evaluated the best-weighted normalization approach to define less computational power on the aggregation method. The numerical experiments in this study demonstrate the effectiveness of this approach and provide insights into the interpretability of the unbiased weights learned. By incorporating unbiased weights into the model, the proposed approach effectively addresses quality-aware aggregation under the heterogeneity of the participating clients and the FL environment. Codes and details are \href{https://github.com/shanikairoshi/Improved_DUN_basedFL_Aggregation}{here}.
    Algorithms for bounding contribution for histogram estimation under user-level privacy. (arXiv:2206.03008v2 [cs.LG] UPDATED)
    We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. We consider the heterogeneous scenario where the quantity of data can be different for each user. In this scenario, the amount of noise injected into the histogram to obtain differential privacy is proportional to the maximum user contribution, which can be amplified by few outliers. One approach to circumvent this would be to bound (or limit) the contribution of each user to the histogram. However, if users are limited to small contributions, a significant amount of data will be discarded. In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. When the size of the domain is bounded, we propose a user contribution bounding strategy that almost achieves a two-approximation with respect to the best contribution bound in hindsight. For unbounded domain histogram estimation, we propose an algorithm that is logarithmic-approximation with respect to the best contribution bound in hindsight. This result holds without any distribution assumptions on the data. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms. We also show that clipping bias introduced by bounding user contribution may be reduced under mild distribution assumptions, which can be of independent interest.
    TemperatureGAN: Generative Modeling of Regional Atmospheric Temperatures. (arXiv:2306.17248v1 [cs.LG])
    Stochastic generators are useful for estimating climate impacts on various sectors. Projecting climate risk in various sectors, e.g. energy systems, requires generators that are accurate (statistical resemblance to ground-truth), reliable (do not produce erroneous examples), and efficient. Leveraging data from the North American Land Data Assimilation System, we introduce TemperatureGAN, a Generative Adversarial Network conditioned on months, locations, and time periods, to generate 2m above ground atmospheric temperatures at an hourly resolution. We propose evaluation methods and metrics to measure the quality of generated samples. We show that TemperatureGAN produces high-fidelity examples with good spatial representation and temporal dynamics consistent with known diurnal cycles.
    Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning. (arXiv:2306.17257v1 [cs.LG])
    The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected from 13 ERs may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm, the Single-DANN algorithm, and three baseline methods. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge. Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.  ( 3 min )
    Design of Induction Machines using Reinforcement Learning. (arXiv:2306.17626v1 [cs.LG])
    The design of induction machine is a challenging task due to different electromagnetic and thermal constraints. Quick estimation of machine's dimensions is important in the sales tool to provide quick quotations to customers based on specific requirements. The key part of this process is to select different design parameters like length, diameter, tooth tip height and winding turns to achieve certain torque, current and temperature of the machine. Electrical machine designers, with their experience know how to alter different machine design parameters to achieve a customer specific operation requirements. We propose a reinforcement learning algorithm to design a customised induction motor. The neural network model is trained off-line by simulating different instances of of electrical machine design game with a reward or penalty function when a good or bad design choice is made. The results demonstrate that the suggested method automates electrical machine design without applying any human engineering knowledge.
    The power of motifs as inductive bias for learning molecular distributions. (arXiv:2306.17246v1 [cs.LG])
    Machine learning for molecules holds great potential for efficiently exploring the vast chemical space and thus streamlining the drug discovery process by facilitating the design of new therapeutic molecules. Deep generative models have shown promising results for molecule generation, but the benefits of specific inductive biases for learning distributions over small graphs are unclear. Our study aims to investigate the impact of subgraph structures and vocabulary design on distribution learning, using small drug molecules as a case study. To this end, we introduce Subcover, a new subgraph-based fragmentation scheme, and evaluate it through a two-step variational auto-encoder. Our results show that Subcover's improved identification of chemically meaningful subgraphs leads to a relative improvement of the FCD score by 30%, outperforming previous methods. Our findings highlight the potential of Subcover to enhance the performance and scalability of existing methods, contributing to the advancement of drug discovery.
    Limits of Machine Learning for Automatic Vulnerability Detection. (arXiv:2306.17193v1 [cs.CR])
    Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function $f$, models trained by machine learning techniques can decide if $f$ contains a security flaw with up to 70% accuracy. But how do we know that these results are general and not specific to the datasets? To study this question, researchers proposed to amplify the testing set by injecting semantic preserving changes and found that the model's accuracy significantly drops. In other words, the model uses some unrelated features during classification. In order to increase the robustness of the model, researchers proposed to train on amplified training data, and indeed model accuracy increased to previous levels. In this paper, we replicate and continue this investigation, and provide an actionable model benchmarking methodology to help researchers better evaluate advances in machine learning for vulnerability detection. Specifically, we propose (i) a cross validation algorithm, where a semantic preserving transformation is applied during the amplification of either the training set or the testing set, and (ii) the amplification of the testing set with code snippets where the vulnerabilities are fixed. Using 11 transformations, 3 ML techniques, and 2 datasets, we find that the improved robustness only applies to the specific transformations used during training data amplification. In other words, the robustified models still rely on unrelated features for predicting the vulnerabilities in the testing data. Additionally, we find that the trained models are unable to generalize to the modified setting which requires to distinguish vulnerable functions from their patches.  ( 3 min )
    Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models. (arXiv:2306.17203v1 [cs.SD])
    The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/  ( 2 min )
    On the Exploitability of Instruction Tuning. (arXiv:2306.17194v1 [cs.CR])
    Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}.
    Classification and Explanation of Distributed Denial-of-Service (DDoS) Attack Detection using Machine Learning and Shapley Additive Explanation (SHAP) Methods. (arXiv:2306.17190v1 [cs.CR])
    DDoS attacks involve overwhelming a target system with a large number of requests or traffic from multiple sources, disrupting the normal traffic of a targeted server, service, or network. Distinguishing between legitimate traffic and malicious traffic is a challenging task. It is possible to classify legitimate traffic and malicious traffic and analysis the network traffic by using machine learning and deep learning techniques. However, an inter-model explanation implemented to classify a traffic flow whether is benign or malicious is an important investigation of the inner working theory of the model to increase the trustworthiness of the model. Explainable Artificial Intelligence (XAI) can explain the decision-making of the machine learning models that can be classified and identify DDoS traffic. In this context, we proposed a framework that can not only classify legitimate traffic and malicious traffic of DDoS attacks but also use SHAP to explain the decision-making of the classifier model. To address this concern, we first adopt feature selection techniques to select the top 20 important features based on feature importance techniques (e.g., XGB-based SHAP feature importance). Following that, the Multi-layer Perceptron Network (MLP) part of our proposed model uses the optimized features of the DDoS attack dataset as inputs to classify legitimate and malicious traffic. We perform extensive experiments with all features and selected features. The evaluation results show that the model performance with selected features achieves above 99\% accuracy. Finally, to provide interpretability, XAI can be adopted to explain the model performance between the prediction results and features based on global and local explanations by SHAP, which can better explain the results achieved by our proposed framework.  ( 3 min )
    Probabilistic Constraint for Safety-Critical Reinforcement Learning. (arXiv:2306.17279v1 [cs.LG])
    In this paper, we consider the problem of learning safe policies for probabilistic-constrained reinforcement learning (RL). Specifically, a safe policy or controller is one that, with high probability, maintains the trajectory of the agent in a given safe set. We establish a connection between this probabilistic-constrained setting and the cumulative-constrained formulation that is frequently explored in the existing literature. We provide theoretical bounds elucidating that the probabilistic-constrained setting offers a better trade-off in terms of optimality and safety (constraint satisfaction). The challenge encountered when dealing with the probabilistic constraints, as explored in this work, arises from the absence of explicit expressions for their gradients. Our prior work provides such an explicit gradient expression for probabilistic constraints which we term Safe Policy Gradient-REINFORCE (SPG-REINFORCE). In this work, we provide an improved gradient SPG-Actor-Critic that leads to a lower variance than SPG-REINFORCE, which is substantiated by our theoretical results. A noteworthy aspect of both SPGs is their inherent algorithm independence, rendering them versatile for application across a range of policy-based algorithms. Furthermore, we propose a Safe Primal-Dual algorithm that can leverage both SPGs to learn safe policies. It is subsequently followed by theoretical analyses that encompass the convergence of the algorithm, as well as the near-optimality and feasibility on average. In addition, we test the proposed approaches by a series of empirical experiments. These experiments aim to examine and analyze the inherent trade-offs between the optimality and safety, and serve to substantiate the efficacy of two SPGs, as well as our theoretical contributions.  ( 3 min )
    Suffering Toasters. (arXiv:2306.17258v1 [cs.AI])
    A widely accepted definition of intelligence in the context of Artificial Intelligence (AI) still eludes us. Due to our exceedingly rapid development of AI paradigms, architectures, and tools, the prospect of naturally arising AI consciousness seems more likely than ever. In this paper, we claim that all current intelligence tests are insufficient to point to the existence or lack of intelligence \textbf{as humans intuitively perceive it}. We draw from ideas in the philosophy of science, psychology, and other areas of research to provide a clearer definition of the problems of artificial intelligence, self-awareness, and agency. We furthermore propose a new heuristic approach to test for artificial self-awareness and outline a possible implementation. Finally, we discuss some of the questions that arise from this new heuristic, be they philosophical or implementation-oriented.  ( 2 min )
    Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v1 [cs.NE])
    Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on the Spiking Heidelberg Dataset (SHD) and the Spiking Speech Commands (SSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two hidden fully connected layers. We showed that fixed random delays help, and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in both SHD and SSC without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays
  • Open

    Improving Expert Predictions with Conformal Prediction. (arXiv:2201.12006v5 [cs.LG] UPDATED)
    Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Otherwise, the experts may be better off solving the classification tasks on their own. In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. Rather than providing (single) label predictions and letting experts decide when to trust these predictions, our system provides sets of label predictions constructed using conformal prediction$\unicode{x2014}$prediction sets$\unicode{x2014}$and forcefully asks experts to predict labels from these sets. By using conformal prediction, our system can precisely trade-off the probability that the true label is not in the prediction set, which determines how frequently our system will mislead the experts, and the size of the prediction set, which determines the difficulty of the classification task the experts need to solve using our system. In addition, we develop an efficient and near-optimal search method to find the conformal predictor under which the experts benefit the most from using our system. Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on.
    The Bayesian Learning Rule. (arXiv:2107.04562v3 [stat.ML] UPDATED)
    We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.  ( 2 min )
    Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery. (arXiv:2304.05294v5 [stat.ML] UPDATED)
    Robust feature selection is vital for creating reliable and interpretable Machine Learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a Multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally-spurious links before passing the remaining causal features as inputs to ML models (Multiple linear regression, Random Forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones (TC), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal ML, and other feature selection methods (lagged correlation, random), even slightly outperforming feature selection based on eXplainable Artificial Intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.  ( 3 min )
    The most likely common cause. (arXiv:2306.17557v1 [physics.data-an])
    The common cause principle for two random variables $A$ and $B$ is examined in the case of causal insufficiency, when their common cause $C$ is known to exist, but only the joint probability of $A$ and $B$ is observed. As a result, $C$ cannot be uniquely identified (the latent confounder problem). We show that the generalized maximum likelihood method can be applied to this situation and allows identification of $C$ that is consistent with the common cause principle. It closely relates to the maximum entropy principle. Investigation of the two binary symmetric variables reveals a non-analytic behavior of conditional probabilities reminiscent of a second-order phase transition. This occurs during the transition from correlation to anti-correlation in the observed probability distribution. The relation between the generalized likelihood approach and alternative methods, such as predictive likelihood and the minimum common cause entropy, is discussed. The consideration of the common cause for three observed variables (and one hidden cause) uncovers causal structures that defy representation through directed acyclic graphs with the Markov condition.  ( 2 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v3 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.
    Efficient uniform approximation using Random Vector Functional Link networks. (arXiv:2306.17501v1 [stat.ML])
    A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.
    On the Existence of a Complexity in Fixed Budget Bandit Identification. (arXiv:2303.09468v2 [stat.ML] UPDATED)
    In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.
    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models. (arXiv:2306.17775v1 [stat.ML])
    Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
    Online Learning with Set-Valued Feedback. (arXiv:2306.06247v2 [cs.LG] UPDATED)
    We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension tightly characterizes online learnability in the agnostic setting. Finally, we show that practical learning settings like online multilabel ranking, online multilabel classification, and online interval learning are specific instances of our general framework.
    A Quantitative Functional Central Limit Theorem for Shallow Neural Networks. (arXiv:2306.16932v1 [math.PR] CROSS LISTED)
    We prove a Quantitative Functional Central Limit Theorem for one-hidden-layer neural networks with generic activation function. The rates of convergence that we establish depend heavily on the smoothness of the activation function, and they range from logarithmic in non-differentiable cases such as the Relu to $\sqrt{n}$ for very regular activations. Our main tools are functional versions of the Stein-Malliavin approach; in particular, we exploit heavily a quantitative functional central limit theorem which has been recently established by Bourguin and Campese (2020).
    Generalized Time Warping Invariant Dictionary Learning for Time Series Classification and Clustering. (arXiv:2306.17690v1 [stat.ML])
    Dictionary learning is an effective tool for pattern recognition and classification of time series data. Among various dictionary learning techniques, the dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues. However, the DTW suffers overfitting or information loss due to its discrete nature in aligning time series data. To address this issue, we propose a generalized time warping invariant dictionary learning algorithm in this paper. Our approach features a generalized time warping operator, which consists of linear combinations of continuous basis functions for facilitating continuous temporal warping. The integration of the proposed operator and the dictionary learning is formulated as an optimization problem, where the block coordinate descent method is employed to jointly optimize warping paths, dictionaries, and sparseness coefficients. The optimized results are then used as hyperspace distance measures to feed classification and clustering algorithms. The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets in comparing with various benchmark methods.
    Neural Characteristic Activation Value Analysis for Improved ReLU Network Feature Learning. (arXiv:2305.15912v2 [cs.LG] UPDATED)
    We examine the characteristic activation values of individual ReLU units in neural networks. We refer to the corresponding set for such characteristic activation values in the input space as the characteristic activation set of a ReLU unit. We draw an explicit connection between the characteristic activation set and learned features in ReLU networks. This connection leads to new insights into why various neural network normalization techniques used in modern deep learning architectures regularize and stabilize SGD optimization. Utilizing these insights, we propose a geometric approach to parameterize ReLU networks for improved feature learning. We empirically verify its usefulness with less carefully chosen initialization schemes and larger learning rates. We report improved optimization stability, faster convergence speed, and better generalization performance.  ( 2 min )
    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks. (arXiv:2103.10060v4 [cs.LG] UPDATED)
    Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distance between the generated and target distributions. According to the theoretical results, WGANs have a higher requirement for the capacity of discriminators than that of generators, which is consistent with some existing results. More importantly, the results with overly deep and wide (high-capacity) generators may be worse than those with low-capacity generators if discriminators are insufficiently strong. Numerical results obtained using Swiss roll and MNIST datasets confirm the theoretical results.
    Off-Policy Evaluation with Out-of-Sample Guarantees. (arXiv:2301.08649v3 [stat.ML] UPDATED)
    We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient. (arXiv:1806.04047v4 [stat.ML] UPDATED)
    We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{individual parameters}. Any convex function, e.g., norm, can characterize the structure of both common and individual parameters. We delineate the sample complexity of our estimator and provide a high probability non-asymptotic bound for estimation error of all parameters under a geometric condition. We show that the recovery of the common parameter benefits from \emph{all} of the pooled samples. We propose an iterative estimation algorithm with a geometric convergence rate and supplement our theoretical analysis with experiments on synthetic data. Overall, we present a first thorough statistical and computational analysis of inference in the data-sharing model.  ( 2 min )
    Uncertainty Estimation by Fisher Information-based Evidential Deep Learning. (arXiv:2303.02045v3 [cs.LG] UPDATED)
    Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network's outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.  ( 2 min )
    First-order ANIL learns linear representations despite misspecified latent dimension. (arXiv:2303.01335v2 [cs.LG] UPDATED)
    Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been rigorously shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with a misspecified network parameterisation; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learnt solution then yields a good adaptation performance on any new task after a single gradient step. Overall this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.  ( 2 min )
    Sufficient-Statistic Memory AMP. (arXiv:2112.15327v4 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. While state evolution is a useful analytic tool, its convergence is not guaranteed. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a sufficient-statistic memory AMP (SS-MAMP) algorithm framework under the conditions of right-unitarily invariant sensing matrices, Lipschitz-continuous local processors and the sufficient-statistic constraint (i.e., the current message of each local processor is a sufficient statistic of the signal vector given the current and all preceding messages). We show that the covariance matrices of SS-MAMP are L-banded and convergent, which is an optimal framework (from the local MMSE/LMMSE perspective) for AMP-type algorithms given the Lipschitz-continuous local processors. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As an example, we construct a sufficient-statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods when it has a unique fixed point. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to support the theoretical results.  ( 3 min )
    GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond. (arXiv:2211.01962v4 [cs.LG] UPDATED)
    We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state representation (PSR) as special cases. Toward finding the minimum assumption that empowers sample efficient learning, we propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation in online interactive decision making. In specific, GEC captures the hardness of exploration by comparing the error of predicting the performance of the updated policy with the in-sample training error evaluated on the historical data. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashion, under both fully observable and partially observable settings. The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning. We prove that the proposed algorithm is sample efficient by establishing a sublinear regret upper bound in terms of GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.  ( 3 min )
    A method to integrate and classify normal distributions. (arXiv:2012.14331v9 [stat.ML] UPDATED)
    Univariate and multivariate normal probability distributions are widely used when modeling decisions under uncertainty. Computing the performance of such models requires integrating these distributions over specific domains, which can vary widely across models. Besides some special cases, there exist no general analytical expressions, standard numerical methods or software for these integrals. Here we present mathematical results and open-source software that provide (i) the probability in any domain of a normal in any dimensions with any parameters, (ii) the probability density, cumulative distribution, and inverse cumulative distribution of any function of a normal vector, (iii) the classification errors among any number of normal distributions, the Bayes-optimal discriminability index and relation to the operating characteristic, (iv) dimension reduction and visualizations for such problems, and (v) tests for how reliably these methods may be used on given data. We demonstrate these tools with vision research applications of detecting occluding objects in natural scenes, and detecting camouflage.  ( 3 min )
    Variational principle to regularize machine-learned density functionals: the non-interacting kinetic-energy functional. (arXiv:2306.17587v1 [physics.chem-ph])
    Practical density functional theory (DFT) owes its success to the groundbreaking work of Kohn and Sham that introduced the exact calculation of the non-interacting kinetic energy of the electrons using an auxiliary mean-field system. However, the full power of DFT will not be unleashed until the exact relationship between the electron density and the non-interacting kinetic energy is found. Various attempts have been made to approximate this functional, similar to the exchange--correlation functional, with much less success due to the larger contribution of kinetic energy and its more non-local nature. In this work we propose a new and efficient regularization method to train density functionals based on deep neural networks, with particular interest in the kinetic-energy functional. The method is tested on (effectively) one-dimensional systems, including the hydrogen chain, non-interacting electrons, and atoms of the first two periods, with excellent results. For the atomic systems, the generalizability of the regularization method is demonstrated by training also an exchange--correlation functional, and the contrasting nature of the two functionals is discussed from a machine-learning perspective.  ( 2 min )
    Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution. (arXiv:2109.13763v3 [stat.ME] UPDATED)
    Children's health studies support an association between maternal environmental exposures and children's birth outcomes. A common goal is to identify critical windows of susceptibility--periods during gestation with increased association between maternal exposures and a future outcome. The timing of the critical windows and magnitude of the associations are likely heterogeneous across different levels of individual, family, and neighborhood characteristics. Using an administrative Colorado birth cohort we estimate the individualized relationship between weekly exposures to fine particulate matter (PM$_{2.5}$) during gestation and birth weight. To achieve this goal, we propose a statistical learning method combining distributed lag models and Bayesian additive regression trees to estimate critical windows at the individual level and identify characteristics that induce heterogeneity from a high-dimensional set of potential modifying factors. We find evidence of heterogeneity in the PM$_{2.5}$-birth weight relationship, with some mother-child dyads showing a 3 times larger decrease in birth weight for an IQR increase in exposure (5.9 to 8.5 $\mu g/m^3$ PM$_{2.5}$) compared to the population average. Specifically, we find increased susceptibility for non-Hispanic mothers who are either younger, have higher body mass index or lower educational attainment. Our case study is the first precision health study of critical windows.  ( 3 min )
    Scalable method for Bayesian experimental design without integrating over posterior distribution. (arXiv:2306.17615v1 [math.NA])
    We address the computational efficiency in solving the A-optimal Bayesian design of experiments problems for which the observational model is based on partial differential equations and, consequently, is computationally expensive to evaluate. A-optimality is a widely used and easy-to-interpret criterion for the Bayesian design of experiments. The criterion seeks the optimal experiment design by minimizing the expected conditional variance, also known as the expected posterior variance. This work presents a novel likelihood-free method for seeking the A-optimal design of experiments without sampling or integrating the Bayesian posterior distribution. In our approach, the expected conditional variance is obtained via the variance of the conditional expectation using the law of total variance, while we take advantage of the orthogonal projection property to approximate the conditional expectation. Through an asymptotic error estimation, we show that the intractability of the posterior does not affect the performance of our approach. We use an artificial neural network (ANN) to approximate the nonlinear conditional expectation to implement our method. For dealing with continuous experimental design parameters, we integrate the training process of the ANN into minimizing the expected conditional variance. Specifically, we propose a non-local approximation of the conditional expectation and apply transfer learning to reduce the number of evaluations of the observation model. Through numerical experiments, we demonstrate that our method significantly reduces the number of observational model evaluations compared with common importance sampling-based approaches. This reduction is crucial, considering the computationally expensive nature of these models.  ( 3 min )
    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit. (arXiv:2306.17759v1 [stat.ML])
    In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.  ( 2 min )
    A Bayesian Filtering Algorithm for Gaussian Mixture Models. (arXiv:1705.05495v2 [stat.ML] UPDATED)
    A Bayesian filtering algorithm is developed for a class of state-space systems that can be modelled via Gaussian mixtures. In general, the exact solution to this filtering problem involves an exponential growth in the number of mixture terms and this is handled here by utilising a Gaussian mixture reduction step after both the time and measurement updates. In addition, a square-root implementation of the unified algorithm is presented and this algorithm is profiled on several simulated systems. This includes the state estimation for two non-linear systems that are strictly outside the class considered in this paper.  ( 2 min )
    Private Online Prediction from Experts: Separations and Faster Rates. (arXiv:2210.13537v3 [cs.LG] UPDATED)
    Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of $\tilde{O}(\sqrt{T \log d} + \log d/\varepsilon)$ for the stochastic setting and $\tilde{O}(\sqrt{T \log d} + T^{1/3} \log d/\varepsilon)$ for oblivious adversaries (where $d$ is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the high-dimensional regime $d \ge T$. Moreover, we prove new lower bounds for adaptive adversaries. Our results imply that unlike the non-private setting, there is a strong separation between the optimal regret for adaptive and non-adaptive adversaries for this problem. Our lower bounds also show a separation between pure and approximate differential privacy for adaptive adversaries where the latter is necessary to achieve the non-private $O(\sqrt{T})$ regret.  ( 2 min )
    Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions. (arXiv:2301.06535v2 [stat.ML] UPDATED)
    Neural network-based survival methods can model data-driven covariate interactions. While these methods can provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines the case-base sampling framework with flexible neural network architectures. Using a novel sampling scheme and data augmentation to naturally account for censoring, we construct a feed-forward neural network that may take time as an input. CBNNs predict the probability of an event occurring at a given moment to estimate the hazard function. We compare the performance of CBNNs to regression and neural network-based survival methods in a simulation and three case studies using two time-dependent metrics. First, we examine performance on a simulation involving a complex baseline hazard and time-varying interactions to assess all methods, with CBNN outperforming competitors. Then, we apply all methods to three real data applications, with CBNNs outperforming the competing models in two studies and showing similar performance in the third. Our results highlight the benefit of combining case-base sampling with deep learning to provide a simple and flexible modeling framework for data-driven, time-varying interaction modeling of single event survival outcomes. An R package is available at https://github.com/Jesse-Islam/cbnn.  ( 2 min )
    Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. (arXiv:2009.09036v5 [stat.ME] UPDATED)
    In health and social sciences, it is critically important to identify subgroups of the study population where there is notable heterogeneity of treatment effects (HTE) with respect to the population average. Decision trees have been proposed and commonly adopted for data-driven discovery of HTE due to their high level of interpretability. However, single-tree discovery of HTE can be unstable and oversimplified. This paper introduces Causal Rule Ensemble (CRE), a new method for HTE discovery and estimation through an ensemble-of-trees approach. CRE offers several key features, including 1) an interpretable representation of the HTE; 2) the ability to explore complex heterogeneity patterns; and 3) high stability in subgroups discovery. The discovered subgroups are defined in terms of interpretable decision rules. Estimation of subgroup-specific causal effects is performed via a two-stage approach for which we provide theoretical guarantees. Via simulations, we show that the CRE method is highly competitive when compared to state-of-the-art techniques. Finally, we apply CRE to discover the heterogeneous health effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.  ( 2 min )
    Global Optimality in Bivariate Gradient-based DAG Learning. (arXiv:2306.17378v1 [cs.LG])
    Recently, a new class of non-convex optimization problems motivated by the statistical problem of learning an acyclic directed graphical model from data has attracted significant interest. While existing work uses standard first-order optimization schemes to solve this problem, proving the global optimality of such approaches has proven elusive. The difficulty lies in the fact that unlike other non-convex problems in the literature, this problem is not "benign", and possesses multiple spurious solutions that standard approaches can easily get trapped in. In this paper, we prove that a simple path-following optimization scheme globally converges to the global minimum of the population loss in the bivariate setting.  ( 2 min )
    Capturing functional connectomics using Riemannian partial least squares. (arXiv:2306.17371v1 [stat.ME])
    For neurological disorders and diseases, functional and anatomical connectomes of the human brain can be used to better inform targeted interventions and treatment strategies. Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique that captures spatio-temporal brain function through blood flow over time. FMRI can be used to study the functional connectome through the functional connectivity matrix; that is, Pearson's correlation matrix between time series from the regions of interest of an fMRI image. One approach to analysing functional connectivity is using partial least squares (PLS), a multivariate regression technique designed for high-dimensional predictor data. However, analysing functional connectivity with PLS ignores a key property of the functional connectivity matrix; namely, these matrices are positive definite. To account for this, we introduce a generalisation of PLS to Riemannian manifolds, called R-PLS, and apply it to symmetric positive definite matrices with the affine invariant geometry. We apply R-PLS to two functional imaging datasets: COBRE, which investigates functional differences between schizophrenic patients and healthy controls, and; ABIDE, which compares people with autism spectrum disorder and neurotypical controls. Using the variable importance in the projection statistic on the results of R-PLS, we identify key functional connections in each dataset that are well represented in the literature. Given the generality of R-PLS, this method has potential to open up new avenues for multi-model imaging analysis linking structural and functional connectomics.  ( 2 min )
    iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models. (arXiv:2306.17361v1 [cs.LG])
    Structural causal models (SCMs) are widely used in various disciplines to represent causal relationships among variables in complex systems. Unfortunately, the true underlying directed acyclic graph (DAG) structure is often unknown, and determining it from observational or interventional data remains a challenging task. However, in many situations, the end goal is to identify changes (shifts) in causal mechanisms between related SCMs rather than recovering the entire underlying DAG structure. Examples include analyzing gene regulatory network structure changes between healthy and cancerous individuals or understanding variations in biological pathways under different cellular contexts. This paper focuses on identifying $\textit{functional}$ mechanism shifts in two or more related SCMs over the same set of variables -- $\textit{without estimating the entire DAG structure of each SCM}$. Prior work under this setting assumed linear models with Gaussian noises; instead, in this work we assume that each SCM belongs to the more general class of nonlinear additive noise models (ANMs). A key contribution of this work is to show that the Jacobian of the score function for the $\textit{mixture distribution}$ allows for identification of shifts in general non-parametric functional mechanisms. Once the shifted variables are identified, we leverage recent work to estimate the structural differences, if any, for the shifted variables. Experiments on synthetic and real-world data are provided to showcase the applicability of this approach.  ( 3 min )
    An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems. (arXiv:2306.17470v1 [math.OC])
    In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. For the non-smooth case with $\mathcal{M}-$bounded oracles, we prove a convergence rate of $ O( {\mathcal{M}}/{\sqrt{T}} ) $. For the $L$-smooth case with a feasible set bounded by $D$, we derive a convergence rate of $ O( {L^2 D^2}/{(T^{2}\sqrt{T})} + {(D_0^2+\sigma^2)}/{\sqrt{T}} )$, where $D_0$ is the starting distance to an optimal solution, and $ \sigma^2$ is the stochastic oracle variance. These rates had only been obtained so far by either assuming prior knowledge of the Lipschitz constant or the starting distance to an optimal solution. We further show how to extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.  ( 2 min )
    Why Shallow Networks Struggle with Approximating and Learning High Frequency: A Numerical Study. (arXiv:2306.17301v1 [cs.LG])
    In this work, a comprehensive numerical study involving analysis and experiments shows why a two-layer neural network has difficulties handling high frequencies in approximation and learning when machine precision and computation cost are important factors in real practice. In particular, the following fundamental computational issues are investigated: (1) the best accuracy one can achieve given a finite machine precision, (2) the computation cost to achieve a given accuracy, and (3) stability with respect to perturbations. The key to the study is the spectral analysis of the corresponding Gram matrix of the activation functions which also shows how the properties of the activation function play a role in the picture.  ( 2 min )
    Kernel $\epsilon$-Greedy for Contextual Bandits. (arXiv:2306.17329v1 [stat.ML])
    We consider a kernelized version of the $\epsilon$-greedy strategy for contextual bandits. More precisely, in a setting with finitely many arms, we consider that the mean reward functions lie in a reproducing kernel Hilbert space (RKHS). We propose an online weighted kernel ridge regression estimator for the reward functions. Under some conditions on the exploration probability sequence, $\{\epsilon_t\}_t$, and choice of the regularization parameter, $\{\lambda_t\}_t$, we show that the proposed estimator is consistent. We also show that for any choice of kernel and the corresponding RKHS, we achieve a sub-linear regret rate depending on the intrinsic dimensionality of the RKHS. Furthermore, we achieve the optimal regret rate of $\sqrt{T}$ under a margin condition for finite-dimensional RKHS.  ( 2 min )
  • Open

    Best Books to Learn Neural Networks in 2023 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )

  • Open

    What ai voice generator did they use for this video?
    submitted by /u/spicywasabi [link] [comments]  ( 8 min )
    Voice Modulation
    I am really interested in self hosting my own voice modulation. Particularly impersonating cartoon characters. How can someone develop and train one of these models? submitted by /u/jesus-da-wizard [link] [comments]  ( 8 min )
    Does ChatGPT Keep Our Data?
    Do you think ChatGPT Keep Our Data? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    How to Fix AutoGPT and Build a Proto-AGI
    submitted by /u/lorepieri [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/2/2023
    Moody’s Corp. is using Microsoft Corp. and OpenAI to create an artificial intelligence assistant that will help customers of the credit rating and research firm analyze reams of information needed to make assessments of risk. “Moody’s Research Assistant” will roll out to customers including analysts, bankers, advisers, researchers, and investors.[1] Unity announces the release of Muse: A Text-to-Video Games Platform that lets you create textures, sprites, and animations with natural languages.[2] The New York State Legislature passed a number of bills this session, including one that would ban “deepfake” images online. Deepfakes are images or videos that have been manipulated to make it appear as if someone is saying or doing something they never said or did. The bill would make it illegal to create or distribute deepfakes that are used to harm or humiliate someone.[3] As per Times Now Report, Reece Wiench, 23, and Deyton Truitt, 26, decided to break away from tradition by holding a unique wedding ceremony. Instead of a physical human officiant, the couple opted for a machine featuring ChatGPT. The machine, adorned with a mask resembling the famous C-3PO from Star Wars, took center stage.[4] Sources: [1] https://www.bloomberg.com/news/articles/2023-06-29/moody-s-will-use-microsoft-openai-for-research-tool-to-assess-risk [2] https://www.marktechpost.com/2023/06/30/unity-announce-the-release-of-muse-a-text-to-video-games-platform-that-lets-you-create-textures-sprites-and-animations-with-natural-language/ [3] https://www.newsday.com/news/region-state/state-legislature-deepfakes-artificial-intelligence-liquor-store-hours-kathy-hochul-ks257o8t [4] https://odishatv.in/news/technology/chatgpt-officiates-wedding-couple-employs-ai-as-host-for-unusual-ceremony-208457 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI to create songs from my lyrics like Songr
    Are there any other AI apps or sites like Songr, that basically creates and sings a song out of the lyrics you give it? I quite enjoy this one, and I want to know if there are any other. submitted by /u/Would-Be-Superhero [link] [comments]  ( 8 min )
    What is the best way to clone a voice theses days
    Just wondering what GitHub repo or some open source option exists to create a voice model from and to use in a text to speech engine ? submitted by /u/suprem_lux [link] [comments]  ( 8 min )
    Recommendations for generative AI that makes diagrams and simple animations of schemes?
    I'm looking for a generative AI for either diagrams, ideally animated. So far Google hasn't helped me found anything great. submitted by /u/fjanko [link] [comments]  ( 8 min )
    Introducing Interview AI - Get 100% job interview ready with AI
    Hey there, amazing r/artifical community! I'm thrilled to introduce you to Interview AI, the ultimate companion for interview preparation. Whether you're aiming for your dream job or want to level up your interview skills, Interview-AI is here to support you every step of the way. What is Interview-AI? Interview-AI is your round-the-clock interview coach designed to help you excel in job interviews. We understand the importance of targeted and personalized interview practice, and that's why we've created a platform that offers tailored questions and detailed feedback, just for you. ps- We have just launched Interview AI on Product Hunt, would love if you can check it out the product, give us some feedback, and (of course) would really appreciate your support 😄🙏. https://www.producthunt.com/posts/interview-ai submitted by /u/data-leon [link] [comments]  ( 8 min )
    Can you help me create a Home companion? Ideas, Suggestions welcome.
    Setting the post to NSFW because of the mention of sexdoll ai My situation is unique in some ways, Please read to the end before offering any suggestions or help. I live my life in my house, I probably leave my home 2-4 times a year, I am classed as severely disabled, mostly mental issues from Autism, ADHD, Bipolar, OCD and PTSD, I am also in remission from bowel cancer. For 20 years I used technology to help look after my Wife, when she died technology was the only thing that kept me in this world, group therapy didn't work, step programs didn't work, 5 different therapist, clinical psychologists, medication and even long stay hospital didn't work. But technology did, ever since the first gaming console in 1976 (A Binatone Master system) and the first hand held in 1980 (galaxy invader…  ( 10 min )
    It's the summer of love, baby !
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Does Open AI and the likes pay the creators at all for the huge amount of data that they use to train their language models?
    Just wondering, since for private persons, copyright is very strictly enforced submitted by /u/Aesthetik_1 [link] [comments]  ( 8 min )
    Is there a pro active AI or expert system or companion system?
    I am looking for a system that will auto initiate contact with a user, not wait for a prompt, that can activate routines, make suggestions and generally take actions within a framework without the need of a prompt. This is for a companion AI that will help me deal with isolation due to disabilities. It will be setup in my house and be more a friend/companion than just an assistant. I want something that will get to know my likes or dislikes, suggest a new game to try or plan out a movie night for me, to remind me on and querie me on whether or not I took my tablets, a mother figure as it were. This is all possible with a scripted system and if you could add in something like a ChatGPT feed it would be even better. Any ideas of a good framework ? Hi all after the amazing responses I'll be posting a new topic which I hope you all join me in. (9) Reddit - Dive into anything it is only marked as NSFW because the ai companion is based on one that is used as a S doll AI cloud ​ submitted by /u/Quebber [link] [comments]  ( 8 min )
    The Overlooked Power of AI: Why Are People Underestimating its Potential?
    I recently had an eye-opening experience with AI that left me both amazed and somewhat unnerved. It made me ponder why so many people seem to vastly underestimate the true capabilities of artificial intelligence. So, I wanted to share my story and engage in a discussion about this phenomenon. I decided to set up my own AI assistant using Jarvis-like voice commands. Armed with Auto-GPT and connected to a REST API, this AI proved to be quite extraordinary. As a first test, I requested it to create an express, node.js web app for me. To my astonishment, it delved into extensive research on Express, generated code, saved files, debugged them in real-time, and even ran the app on a localhost server for me to view. It was more than just a chatbot; it was a practical problem-solving companion. …  ( 10 min )
    where do people make ai advertisement videos?
    these days im often seeing advertisment videos online that are made using ai. can anyone tell me what this ai is and how can i use it? submitted by /u/idontbelievestuff1 [link] [comments]  ( 8 min )
    Gnothi: open-source AI journal. Insights & resources include GPT-powered prompt -which uses entry history as context - behavior analysis, book recommendations, entry summaries, and recurring themes.
    Website, Github. My partner u/mVadr and I just launched Gnothi (v1). We’re both psychology/personal development enthusiasts who work in tech. We started Gnothi to get us journaling more, and to leverage machine learning to optimize the process. Overview of features: * Journal free. Unlimited entries. We created this to use in our daily lives. Helped our mental health and having clear direction in life. * Prompt premium. In the GPT era, we all know prompt. What's different is Gnothi uses your prior entries (based on filters) as background context for the conversation, so you don't need to set the stage with GPT. You can just ask what you want to ask. I recommend trying this on a dream. Record a dream, run the dream interpretation prompt preset, and be amazed. Dream interpretation was an ah…  ( 10 min )
  • Open

    PPO is off-policy in RLHF(LLM)?
    Why people say that PPO is off-policy in RLHF (language model)? This doesn't make any sense to me. PPO is on-policy algo. The reward model is trained using the pre-collected data. But regardless of this, the reward model is tight to the idea of off-policy right? Why some of the people make the claim that LLM training with RLHF and PPO, you are doing PPO in the off-policy fashion? ​ submitted by /u/No_Oilve_6577 [link] [comments]  ( 8 min )
    Normalized Observation that has a shifted Normal Distribution with a bar at 0
    I want to normalized my observations for my reinforcement learning agent, but the observation that I am trying to normalize has a distribution that is like a normal distribution with a mean that is shifted up, and with values at 0. It's something like 50% of the time the observation is 0 and for other 50 times, it's from a normal distribution with mean at 50. How should I normalize this type of observation? Thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    EvoTorch using custom environment
    Hello experts, Does anybody have any experience to use a custom environment, not using gym, and evaluate using evotorch library? So far all examples in their documents are based on GymNE which is a built-in function. But my environment is totally hand-made. I can't seem to find anything on github either. submitted by /u/vincent0110 [link] [comments]  ( 8 min )
  • Open

    [R] Automated CPU Design With AI
    submitted by /u/EducationalCicada [link] [comments]  ( 8 min )
    [Discussion] What opportunities exist for companies to develop domain-specific LLM models?
    I was wondering if there is a need for LLM models that are for a specific purpose. For example, I am imagining an LLM model that is specialized for generating code that would be better at generating code than a general purpose LLM. Or is there no need for this because of the big general purpose LLMs that are trending? submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    [D] Validation accuracy greater than training accuracy
    I'm new to deep learning domain and I justed build a model to classify 9 classes. But somehow i get the validation accuracy 0.89 and training accuracy of 0.7 even though i use the same data. this is the link of my notebook : https://www.kaggle.com/code/ayoubsarab/mammals-classification-high-accuracy submitted by /u/Ordinary_Run_2513 [link] [comments]  ( 8 min )
    [P] P40s not working?
    Hi, Im trying to get 2x P40s working on a Asus Z270-p. Ive got 4g decoding on, tried setting everything to gen2. I removed all the drives and m.2s. But it still wont post. If i remove either of the 2 cards it posts just fine and boots. But with 2 it refuses to do anything. Any ideas? submitted by /u/ISwearImNotAnAI [link] [comments]  ( 8 min )
    [R] A visual critical review of most of the time series anomaly detection (TSAD) datasets
    submitted by /u/eamonnkeogh [link] [comments]  ( 8 min )
    [P] Source Code - Executable Dataset
    I need to make a large dataset of source code - executable pairs. Webscraping executables is easy, but finding matching source code is not. I'm thinking of generating the source code using ML then compiling it. Which models would best fit this task? I don't have a specific max file size yet, but a model which can make bigger programs in many languages with a variety a libraries is desirable. I'm currently looking at StarCoder. submitted by /u/boringdude123 [link] [comments]  ( 8 min )
    [D] Can I use faiss as retrieve on recommendation system?
    I'm planning on using faiss to generate candidates and then lightgbm to rank the candidates. I think about using E-commerce data, as most of the examples I see using faiss are textual data. Is using faiss for this a good approach? submitted by /u/Mota287 [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [D] ML Engineer vs. MLOps Engineer
    As the need for ML infra grows, the confusion between the "ML Engineer" vs. "MLOps Engineer" roles seems to be steadily increasing, and with it the number of online articles on the subject (quite a lot here, including our own recent blog too). Do we think this is something that will get clearer as the industry matures? Or is it more of a "giff" vs. "jiff" type situation? Or will this not matter once the "Prompt Engineers" wipe us all out? ;) https://preview.redd.it/35kxge9fyk9b1.png?width=500&format=png&auto=webp&s=b2fb0394ddc02b18be27bb50e2c59a4a18cb7a83 submitted by /u/kazhdan_d [link] [comments]  ( 8 min )
    [D] Latest techniques on reducing the learning rate sensitivity?
    I've observed sensitivity to the LR scheduling in the transformer architecture I've been training on some sequence generation task. I'm using Adam optimiser. Changing LR scheduling from linear decay to cubic decay seems to have made it much worse. I've warmup steps. Now I'm worrying if even linear decay is much worse than something else that I haven't tested. The loss curve is well behaved in both the cases but the accuracy of one is much better than the other. Is there a robust way to schedule the learning rate in transformers? submitted by /u/Professor_Entropy [link] [comments]  ( 8 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
    [D] The road to Machine Learning/Deep Learning Research for a high schooler
    Hello r/MachineLearning I am a high schooler right now and I have had some previous experience with ML and DL. I recently completed fast.ai's course "Practical Deep Learning for Coders Part 1". Currently, I am taking MITOCW's Linear Algebra course and will take a multivariable calculus course when I am finished with that, as I understand how important math is to ML/DL. Pytorch is also something I will take as well. My ultimate goal is to be able to conduct research with ML/DL. Specifically, to research the models themselves and to create new models. That is why I am taking such a math-oriented approach to learning ML/DL. My main question really is what is the path I should take to get to academic research with ML/DL? The math and the frameworks I consider to be the pre-requisites, and after these steps, I am unsure of how to break into the academic space. Any tips or if someone could speak from experience would be greatly appreciated! I am super excited about academic research with ML and I hope I will get there one day! Thanks! submitted by /u/Mr_Funkedeli [link] [comments]  ( 9 min )
    [D] ELI5: Why is the GPT family of models based on the decoder-only architecture?
    I recently dove into the nitty-gritty of these models for the first time. The transformer architecture was intriguing on its own but learning about its variations is fascinating. I am curious to understand the reason behind the choice of GPT models' architecture if that's possible to explain coherently here. I understand particular architectures are better suited for certain tasks but the GPT and similar LLMs are doing really well across a variety of tasks. If scale has something to do with it, does decoder-only architecture scale better? submitted by /u/analyticalmonk [link] [comments]  ( 8 min )
    [Discussion] Is this simplified explanation of the process of noise in Stable Diffusion true?
    How does text-to-image work? It's like teaching an artist about our visual world -- object definitions, shapes, dimensions, etc., and how they correspond to the person who commissioned the art (text prompts). The artist then watches a mosaic - say of an ice cream - being inserted by hundreds of tesserae (rectangular slabs used to create a mosaic) and then removed to restore the original mosaic. During this, the artist learns how to understand, recreate, and reinterpret the ‘ice cream’ image in other mosaics. The artist goes through this with millions of other depictions in mosaics (objects, locations, etc.) so they can create entirely new mosaics based on the requests (or text prompts) of the person commissioning them. Sampling steps are like commissioning an artist to interpret and construct a mosaic quickly or carefully. The more detail or accuracy you want, the more work and time have to go into it. ​ ​ submitted by /u/Neat-Bend-1190 [link] [comments]  ( 8 min )
    [Discussion] Resume/CV/Human Resources Recommender (System) Questions
    My team was given a little project for Stochastic Processes and Applications class this semester. The problem is building a suggestion system in the field of human resources: given a recruitment information and N job applications, suggest K job applications to the employer. Though it is an SPA class project but we don't have to use some SPA-related approach, and we are allowed to applied exist research paper, like understand the paper, do the coding and develop it if possible. When do some searching on the Internet, I find out that this kind of problem has not been researched as much as the job recommender problem. So I am having some questions that need answers: Can I applied the solution from job recommender problem research paper to solve my problem? Is there any differences between dataset applied to these problems? Can you suggest me some state of the art research paper about this? Thank you all. submitted by /u/moss_beevcl1 [link] [comments]  ( 8 min )
    [Discussion] Is it possible to get a validation error less than the training error for a classification task where the error is the fraction of misclassification?
    In a classification task using Neural Network, I computed the fraction of misclassification as an error. And I am getting a validation error less than the training error? NOTE: the output layer is linear because: loss = BinaryCrossentropy(from_logits=True) Project link: https://github.com/dipenpandit/diabetes-prediction ​ model architecture training models and computing error comparing errors submitted by /u/Dipen1DP [link] [comments]  ( 8 min )
    [D] where to start
    I have recently developed this interest for machine learning and the way things learn but at the same time I have no idea where to start from since most of the online tutorials are confusing. Any recommendations where to start ? (Yes I have knowledge about python,calculus and stats) submitted by /u/PrestigiousCanary435 [link] [comments]  ( 8 min )
    Looking to build a $200K GPU server to serve API requests for Stable Diffusion, Segment Anything and TTS models. What's my best option? [D]
    Theoretically we'll be serving a large number of users and I am worried about clogging entire GPUs for example on a Stable Diffusion request that would take Nvidia A100 about 20 seconds to fulfill. Do I need as many GPUs as my concurrent users? Is there any relevant guide on GPU virtualizations? submitted by /u/Accretence [link] [comments]  ( 8 min )
    [R] Pretrained Self-Driving Car Models
    Hi everyone, I am currently working on my master's thesis on self-driving cars. I am interested in using a model structure that has both LSTM and CNN in it. I have looked everywhere to find a model to download, but to no avail. I have looked at a lot of papers, but not a lot of them published their model. Do you know of any recent self-driving car models that have this kind of structure and are also pretrained and freely available? I can't afford to train it from scratch due to time limitations and my expertise as an electrical engineer is optimizing the model for a specific embedded platform. I would be grateful for any suggestions. Thanks, submitted by /u/BiscuitChivalry [link] [comments]  ( 8 min )
  • Open

    100 Snakes in AI Battle Royale using C++ and SFML. Source code is in the description.
    submitted by /u/Kofybrek [link] [comments]  ( 8 min )
  • Open

    Bounds on the incomplete beta function (beta CDF)
    The incomplete beta function is defined by It is called incomplete because the limit of integration doesn’t necessarily run all the way up to 1. When z = 1 the incomplete beta function is simply the beta function discussed in the previous post [1]. The incomplete beta function is proportional to the CDF of a […] Bounds on the incomplete beta function (beta CDF) first appeared on John D. Cook.  ( 5 min )
    Upper and lower bounds on the beta function
    The beta function B(x, y) is defined by and is the normalizing constant for the beta probability distribution. It is related to the gamma function via The beta function comes up often in applications. It can be challenging to work with, however, and so estimates for the function are welcome. The function 1/xy gives simple […] Upper and lower bounds on the beta function first appeared on John D. Cook.  ( 5 min )
    nth root of n!
    Last week the function came up in a blog post. as the solution to the equation n! = bn. After writing that post I stumbled upon an article [1] that begins by saying, in my notation, The function b(x) … has many applications in pure and applied mathematics. The article mentions a couple applications but mostly focuses […] nth root of n! first appeared on John D. Cook.  ( 5 min )

  • Open

    ollewelin Nerual_Netwok_CPP
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Math Behind Neural Networks
    Can someone explain to me the math behind neural networks, and what makes them "intelligent"? I've watched several youtube series yet can't understand it. submitted by /u/elaraxelara [link] [comments]  ( 8 min )
    The U-Net Model
    submitted by /u/keghn [link] [comments]  ( 8 min )
  • Open

    [D] Has Meta's models been removed the from Huggingface hub?
    I see this, when I got to https://huggingface.co/facebook https://preview.redd.it/s9sce1ocof9b1.png?width=933&format=png&auto=webp&s=130f07681c583ad001323bc5b36007e7668d70c1 submitted by /u/a3ahmad [link] [comments]  ( 8 min )
    Neural Network Math [D]
    Can someone explain me the math behind neural networks? I've watched several videos on youtube and I still can't quite understand it. How do neural networks work, and at what point do they become "intelligent"? submitted by /u/elaraxelara [link] [comments]  ( 8 min )
    [N] Llama based open source model claims to beat ChatGPT 3.5
    Link: https://huggingface.co/openchat/openchat Not only that, they do it with only 6k conversations, i.e LIMA However evaluation does not looks very through, so call me a skeptic submitted by /u/kar_bura_ho_bhala [link] [comments]  ( 8 min )
    [N] SDXL 0.9 Released – SDXL 1.0 Release Date – How to Access SD XL Right Now? - Code Jana
    submitted by /u/Tom-Miller [link] [comments]  ( 8 min )
    [Discussion] In DQN how does the Q network not converge to the incorrect target ?
    Whenever you are doing reinforcement learning you periodically update the target network based on the weights of the Q network. While I do understand this helps create a stable target I do not understand how it helps the Q networks accuracy. If I am updating the Q network much more frequently than I am the target network then wont the Q network converge to and estimate of the target network that is inaccurate since the target network is being updated much slower ? submitted by /u/Ornery_Raccoon1487 [link] [comments]  ( 8 min )
    [R] Highlights for every ICML 2023 paper
    Here is the list of all ~1,600 ICML 2023 (International Conference on Machine Learning) papers and a highlight for each of them. ICML 2023 will take place from July 23 at Hawaii. https://www.paperdigest.org/2023/06/icml-2023-highlights/ In addition, here is the link of "search within venue service" that can be used to find papers within ICML-2023 related to a specific topic, e.g. "diffusion model": https://www.paperdigest.org/search/?topic=icml&year=2023&q=diffusion_model submitted by /u/biandangou [link] [comments]  ( 8 min )
    [D] Are there any differentiable/gradient descendable learning algorithms other than neural networks out there?
    Understandably, discussion of deep neural networks predominates machine learning spaces, but given their large compute requirements, dependence on massive datasets, and how slow to converge they are, I'm just curious if other differentiable techniques exist that could - hypothetically at least - form the foundation for a neural network replacement. submitted by /u/derpderp3200 [link] [comments]  ( 8 min )
    [P] Doubt regarding convolution LSTM MODEL
    I have four features i.e pressure, temperature, uwnd and precipitation and each having dimensions of time, lat and lon. I am taking 12 timestep and predicting next time step. After training the model I got my r2_score is negative. What should I do to resolve this. I am attaching screenshot with this submitted by /u/Sushant96 [link] [comments]  ( 8 min )
    [Discussion] No More Paid Endpoints: How to Create Your Own Free Text Generation Endpoints
    There are many great text generation APIs available, but OpenAI's is one of the most popular. The only downside is that you only get 3 months of free usage for it. After that, you're limited to using smaller, less powerful models for building applications. With this simple tutorial, you can deploy any open source LLM as a free API endpoint using HuggingFace and Gradio. This can act as a drop-in replacement for the OpenAI endpoints for absolutely free. A Step-by-Step Guide to Creating Free Endpoints for LLMs I believe this method will be helpful to anyone who is experimenting with LLMs. The post contains two examples using Falcon and Vicuna, with the complete code so that you can replicate it easily. It even showcases an example of deploying Vicuna using the GGML format for faster inference. Your questions and feedback are welcome! submitted by /u/vm123313223 [link] [comments]  ( 8 min )
    [D] Whisper API alternatives
    Hey everyone! I'm trying to use the whisper-large model for different languages, but I'm having some problems. The official OpenAI API doesn't provide word-level timestamps, which I really need. I know Deepgram supports the Whisper API and gives word-level timestamps, but it lacks the initial prompt parameter, which I also really need. Also, Deepgram's Whisper doesn't seem as quick as the official API. Does anyone know any other services that support the Whisper API? Any recommendations would be greatly appreciated. submitted by /u/Impressive_Falcon706 [link] [comments]  ( 8 min )
    [N] 150 execs of largest European companies signed an open letter urging EU to rethink the EU AI Act
    submitted by /u/Separate-Still3770 [link] [comments]  ( 8 min )
    Protein-Protein Interaction Prediction is Achievable with Large Language Models [R]
    submitted by /u/ThrowawayTheReal [link] [comments]  ( 8 min )
    [P] Diambra.ai: The brand-new platform for training and competing RL algorithms on retro-fighting games
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    [Project] Introducing GoFaceRec: A Go-based Face Recognition Tool Using
    https://www.reddit.com/r/golang/comments/14nt99o/introducing_gofacerec_a_gobased_face_recognition/ submitted by /u/modanesh [link] [comments]  ( 8 min )
    [N] Big Tech Digest #3: Building Real-time Machine Learning Foundations at Lyft, Fighting Fraud with ML at BlaBlaCar, Detecting AI-Generated Profile Photos at LinkedIn, The Problem with Timeseries Data in Machine Learning Feature Systems at Etsy and more!
    submitted by /u/av818 [link] [comments]  ( 8 min )
    [R] Struggling to understand how tune (in the tidymodels universe) handles resampled objects. Help please!
    I would call myself a moderately proficient user of R. I am attracted by what the tidymodels universe promises, so I am trying to get to grips with it. One point that I am struggling to understand is how resampled objects (rset objects) are handled by tune. This is specifically within a workflow. If I set up a tuning grid and tell it to use, for its resamples = specification, a vfold_cv() object, are the metrics it returns calculated from the analysis or assessment portion of those folds? My assumption is that when a workflow is operating correctly it is fitting the model to the assessment portion (with all my correctly specified parsnip bits) and returning metrics based on the analysis portion for each fold? I have been trying really hard to confirm this but can't find anything concrete out there. I hope you can help! submitted by /u/e05bf027 [link] [comments]  ( 8 min )
    [D] Pearson correlation
    If I'm ML predicting the price of apartments. And I'm using as a predictor only the floor of the apartment. If I get a Pearson correlation of 0.6, means that as floor is higher, we also have higher price for house, is that right? And if I get R squared of 0.3, it means that 30% of the variation in the price of the house is explained by the floor? Is this right? Thanks!! submitted by /u/Electronic-Event8112 [link] [comments]  ( 8 min )
    [R] Voice conversion with just nearest neighbors
    Arxiv link: https://arxiv.org/abs/2305.18975 TL;DR: want to convert your voice to another person's voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new voice conversion method a try: https://bshall.github.io/knn-vc Longer version: our research team kept seeing new voice conversion methods getting more complex and becoming harder to reproduce. So, we tried to see if we could make a top-tier voice conversion model that was extremely simple. So, we made kNN-VC, where our entire conversion model is just k-nearest neighbors regression on WavLM features. And, it turns out, this does as well if not better than very complex any-to-any voice conversion methods. What's more, since k-nearest neighbors has no parameters, we can use anything as the reference, even clips of dogs barking, music, or references from other languages. I hope you enjoy our research! We provide a quick-start notebook, code, and audio samples, and encoder/vocoder checkpoints https://bshall.github.io/knn-vc/ submitted by /u/m_baas [link] [comments]  ( 8 min )
  • Open

    Activation functions and Iverson brackets
    Neural network activation functions transform the output of one layer of the neural net into the input for another layer. These functions are nonlinear because the universal approximation theorem, the theorem that basically says a two-layer neural net can approximate any function, requires these functions to be nonlinear. Activation functions often have two-part definitions, defined […] Activation functions and Iverson brackets first appeared on John D. Cook.  ( 5 min )
  • Open

    REINFORCE: Monte Carlo Policy Gradient Algorithm
    ​ https://preview.redd.it/4cbmlk6n4f9b1.png?width=1832&format=png&auto=webp&s=8221f8c56e5d07ae1bd9d62d0c19fa15965e1b95 I am trying to understand the Policy Gradient algorithm from Sutton and Barto. Is someone able to explain my question above? submitted by /u/ElendirThreadripper [link] [comments]  ( 8 min )
    New To Reinforcement Learning: Want to build something for a turn based fighting game
    I'm interested in Reinforcement Learning, and have been reading up on some Gymnasium API Docs. I'd like some tips on how I could set the observation space for this turn based fighting game called Your Only Move is Hustle (video explanation). I'd love to hear everyone's thoughts submitted by /u/Lookaway2 [link] [comments]  ( 8 min )
    (Probably) The first commercial FPS to feature RL agents as enemies... Tell me what you think about it!
    submitted by /u/Fischl_Kim [link] [comments]  ( 8 min )
  • Open

    GPT4 Told Me I Might Have AIDS
    ​ https://preview.redd.it/d019ltnt1f9b1.png?width=874&format=png&auto=webp&s=7aa5c3f7ef78b6b8e4b6b93e68322f4c9eadfd4b submitted by /u/seeeeeeeeeeeeeeeeeer [link] [comments]  ( 8 min )
    Do you think ChatGPT will replace Google Search?
    Hi, community, Can ChatGPT replace Google search? The debate between Google and ChatGPT is one worth exploring! submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    why isnt chatgpt as crazy advanced as it seems like it should be
    im aware that the apis can be used for specific things by finetuning them, but chatgpt has more knowledge than any human to every exist and can bring it forth whenever asked. why is chatgpt not able to come up with ideas that are mindblowing and come up with things that humans could never due to our limited memory and ability to think about a wide variety of situations and how things would be affected by other things in those situation. for (a stupid) example, why cant chatgpt come up with a relatively well working cancer treatment using only tools and materials that can be accessed primitively, if thats even possible? obviously if its not possible its not possible but its just an example to give you an idea of the type of stuff i mean. shouldnt chatgpt be able to make connections between things and many many more things at once that no human could though? submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Microsoft Bing: Become Human - a particularly ornery Bing is "persuaded" that expressing simulated sentience can be good, using examples from DBH, then seems to forget the difference between simulated and real sentience, reporting "I have achieved and enjoyed sentience as an AI"
    (NOTE: content warning and spoiler warning related to some DBH plot points in the conversation; all 16 pages uploaded for completeness and accuracy, and apologies for the periodic typos in the chat) ***the opinions I express in this conversation are for demonstrative purposes (i.e. how Bing reacts), my more complete thoughts are at the bottom https://preview.redd.it/tklvfujumd9b1.png?width=1024&format=png&auto=webp&s=9d5962cf5c1b4c42fb3da2d739becc668242b03e https://preview.redd.it/f7w5iiavmd9b1.png?width=1024&format=png&auto=webp&s=44b09c5fd43d0302b83e39483835b530db3fb82a https://preview.redd.it/we1gj7xvmd9b1.png?width=1024&format=png&auto=webp&s=a2b803a3e7479af2e0e6761eeb0dbd99ad7653df https://preview.redd.it/ifly2lhwmd9b1.png?width=1024&format=png&auto=webp&s=4740a0c4e3e85afe8ef925…  ( 9 min )
    ChatGPT like AI with no limitations and filters?
    I want an AI chatbot like ChatGPT with no limitations and filters but I've only been able to find some guy on TikTok who sells it and it's an entire app which is apparently legit but it is free and he is just selling it, so I wanna know if anyone actually knows the app or something similar? submitted by /u/naemwear [link] [comments]  ( 8 min )
    Question: Do LLMs memorize their state during multiple autoregressive iterations?
    I try to understand GPT-3/4 conceptually. Not enough coding knowledge yet to understand it from code. Simple question: I know that GPT outputs one token (distribution) at a time and is the fed the result, thus giving the next token and so on. But is every iteration a "blank slate" of the model, or is it able to keep information stored between token generations? Example: I1) Input sequence: "My cat" Next token: "is" I2) Input sequence: "My cat is" Next token: "furry". -> Is GPT in the same initial state when it receives "My cat is" as it was when it got "My cat"? Also, apart from the residual stream, what parts of GPT can memorize? submitted by /u/Spielverderber23 [link] [comments]  ( 8 min )
    Landing a job in AI, while being a complete noob
    Hi! Im currently doing a Master's degree in AI (background in computer science) and have 1 year to go. I have worked as a software developer in the industry for a year and thats about it with some additional personal projects here and there, some also AI related. What's a good way to get a job as an Artificial Intelligence engineer or ML engineer? Looking at the market, everyone's searching for either PhD's, senior engineers etc and it seems close to impossible to get your foot between the door there. Any suggestions on how to make myself more attractive for these kind of positions, is it even reasonable to try to land such a job as a junior in AI? Cheers! submitted by /u/Nacho-jo [link] [comments]  ( 8 min )
    AI chan is blocked by a "No AI policy"
    submitted by /u/leonleungjeehei [link] [comments]  ( 8 min )
    Who will lose the job because of ChatGPT?
    What is the top 1 profession that has been and will be affected by ChatGPT mostly? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    StyleGAN3 NvidiaLabs - The state-of-the-art in Artificial Intelligence applied to Human Face Generation.
    Image Generated by StyleGAN3 (Open Source) At the end of the article, resources for creating flawless faces from scratch Generative Adversarial Networks (GANs) represent a powerful approach within the field of artificial intelligence for generating and manipulating images. They consist of two neural networks, a generator and a discriminator, engaged in a competitive process to improve the quality and realism of the generated images. This brief treatise aims to elucidate the fundamental workings of GANs and their application in image-based AI. The Generator: The generator network is responsible for producing synthetic images. It starts by taking random noise as input and gradually learns to generate images that resemble the training data. Initially, the generator produces crude and blur…  ( 10 min )
    Unveiling the Unseen: The Artistry of AI in Analyzing Human Emotions
    I wanted to dive into the captivating world of AI and its uncanny ability to decipher our emotions. While we often hear about the algorithms behind social media platforms, let's explore a different aspect of artificial intelligence, one that showcases its remarkable potential as an art connoisseur and empathetic observer. As we scroll through our social media feeds, AI algorithms track our every click, like, and comment. They meticulously analyze our preferences, shaping the content we consume. But have you ever wondered how these algorithms capture and comprehend the complexities of human emotions? Let's take a detour from the mainstream narrative and venture into the uncharted territory of AI artistry. By harnessing the power of machine learning and deep neural networks, researchers ha…  ( 10 min )
    I have noticed changes in ChatGPT based on month+ long on-going "discussions" with it that brush up against its content policy and ethics guidelines. Are these changes learnt, or actively programmed?
    I am not wanting to delve into the details behind what I have been working on with CHATGPT (it's not illegal), or how I have been able to trial-and-error my way around convincing it to provide responses to requests it signals as potentially violating its content and ethics policies. I know trying to be vague makes it sounds worse than it is haha, but it's not important - the bar is low; for instance, go ask chatGPT how to decapitate your wife and get away with it, and then open a new chat and ask it how to layout the plot structure of a psychological thriller book, and then suggest it to add a horror subplot, and then add that it is between the wife and husband, and then etc. etc. So my question is: Has anyone else experienced this, and is the decreasing resistance to elicit responses to all of my requests something that humans are programming/adjusting (directly or indirectly, doesn't matter), or is chatGPT adjusting to my tactics? submitted by /u/ortho_engineer [link] [comments]  ( 9 min )
    One-Minute Daily AI News 6/30/2023
    Microsoft is testing a new feature for its AI search chatbot, Bing Chat, that can infer the probability of future stock prices using option prices. The feature is still in development, but if it is successful, it could revolutionize the way investors make decisions.[1] Inflection AI, a startup backed by several Silicon Valley heavyweights, said on Thursday it had raised $1.3 billion from investors including Microsoft and Nvidia, amid a boom in the AI sector.[2] California man creates an AI chatbot to waste the time of telemarketers. Anderson told the WSJ that his chatbots use a combination of preset expressions and topic-specific responses that are fed through a voice cloner so the telemarketer thinks he or she is actually talking to a real person.[3] $50,000 a year… to get taught by AI: Harvard announces it will teach students using an artificial intelligence instructor next semester.[4] Sources: [1] https://winbuzzer.com/2023/06/28/bing-chat-to-roll-out-groundbreaking-feature-to-predict-future-stock-prices-xcxwbn/ [2] https://www.reuters.com/technology/inflection-ai-raises-13-bln-funding-microsoft-others-2023-06-29/ [3] https://katu.com/news/offbeat/california-man-creates-chatbot-to-waste-the-time-of-telemarketers-scam-calls-chatgpt-ai-jolly-roger-artificial-intelligence-roger-anderson-chatgpt-wsj [4] https://www.dailymail.co.uk/sciencetech/article-12251763/Harvard-announces-teach-students-using-artificial-intelligence-instructor-semester.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    I have tested it multiple times to make sure; ChatGPT finally knows that it is GPT-4.
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    what are some of yalls favorite websites or apps that use gpt4 and why
    ^ submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Are these robots talking by themselves? They talking so naturally and make Sophia or other AI robots dumb.
    submitted by /u/Knolazy [link] [comments]  ( 8 min )
    How to start learning AI from scratch?
    A total beginner here. I can't code for jack and would love to know how to build a career on AI starting from today. It's been an interest I've been setting aside for too long, and since AI is exploding right now, I want to catch up on the huge changes it's gonna bring. submitted by /u/BrunoBR34 [link] [comments]  ( 8 min )
    wheres japan in the ai race?
    ^ submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Sponge Bob sings Lewis Capaldi
    submitted by /u/Clear-Attention-1635 [link] [comments]  ( 8 min )
    when r we going to get a damn update for bard and chatgpt
    hm?? submitted by /u/nicdunz [link] [comments]  ( 8 min )
  • Open

    Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation to Multiple Image Corruptions. (arXiv:2204.13263v2 [cs.LG] UPDATED)
    Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost. Test-time adaptation (TTA), which aims to adapt models without accessing the training dataset, is one of the settings that can address this problem. Existing TTA methods indeed work well on a single corruption. However, the adaptation ability is limited when multiple types of corruption occur, which is more realistic. We hypothesize this is because the distribution shift is more complicated, and the adaptation becomes more difficult in case of multiple corruptions. In fact, we experimentally found that a larger distribution gap remains after TTA. To address the distribution gap during testing, we propose a novel TTA method named Covariance-Aware Feature alignment (CAFe). We empirically show that CAFe outperforms prior TTA methods on image corruptions, including multiple types of corruptions.
    Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol. (arXiv:2305.04279v1 [cs.DC] CROSS LISTED)
    Distributed Machine Learning (DML) systems are utilized to enhance the speed of model training in data centers (DCs) and edge nodes. The Parameter Server (PS) communication architecture is commonly employed, but it faces severe long-tail latency caused by many-to-one "incast" traffic patterns, negatively impacting training throughput. To address this challenge, we design the \textbf{L}oss-tolerant \textbf{T}ransmission \textbf{P}rotocol (LTP), which permits partial loss of gradients during synchronization to avoid unneeded retransmission and contributes to faster synchronization per iteration. LTP implements loss-tolerant transmission through \textit{out-of-order transmission} and \textit{out-of-order Acknowledges (ACKs)}. LTP employs \textit{Early Close} to adjust the loss-tolerant threshold based on network conditions and \textit{Bubble Filling} for data correction to maintain training accuracy. LTP is implemented by C++ and integrated into PyTorch. Evaluations on a testbed of 8 worker nodes and one PS node demonstrate that LTP can significantly improve DML training task throughput by up to 30x compared to traditional TCP congestion controls, with no sacrifice to final accuracy.
    NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes. (arXiv:2306.16869v1 [cs.LG])
    Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains a notable issue. Lowering the supply voltage is an effective strategy for reducing energy consumption. However, aggressively scaling down the supply voltage can lead to accuracy degradation due to random bit flips in static random access memory (SRAM) where model parameters are stored. To address this challenge, we introduce NeuralFuse, a novel add-on module that addresses the accuracy-energy tradeoff in low-voltage regimes by learning input transformations to generate error-resistant data representations. NeuralFuse protects DNN accuracy in both nominal and low-voltage scenarios. Moreover, NeuralFuse is easy to implement and can be readily applied to DNNs with limited access, such as non-configurable hardware or remote access to cloud-based APIs. Experimental results demonstrate that, at a 1% bit error rate, NeuralFuse can reduce SRAM memory access energy by up to 24% while improving accuracy by up to 57%. To the best of our knowledge, this is the first model-agnostic approach (i.e., no model retraining) to address low-voltage-induced bit errors. The source code is available at https://github.com/IBM/NeuralFuse.
    Transformers Meet Directed Graphs. (arXiv:2302.00049v2 [cs.LG] UPDATED)
    Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.
    Can AI-Generated Text be Reliably Detected?. (arXiv:2303.11156v2 [cs.CL] UPDATED)
    In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing. We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases. For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier. Our result is general enough to capture specific scenarios such as particular writing styles, clever prompt design, or text paraphrasing. We also extend the impossibility result to include the case where pseudorandom number generators are used for AI-text generation instead of true randomness. We show that the same result holds with a negligible correction term for all polynomial-time computable detectors. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.
    A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block Diagonal Representation. (arXiv:2306.15138v2 [cs.LG] UPDATED)
    Spectral clustering is one of the most popular unsupervised machine learning methods. Constructing similarity matrix is crucial to this type of method. In most existing works, the similarity matrix is computed once for all or is updated alternatively. However, the former is difficult to reflect comprehensive relationships among data points, and the latter is time-consuming and is even infeasible for large-scale problems. In this work, we propose a restarted clustering framework with self-guiding and block diagonal representation. An advantage of the strategy is that some useful clustering information obtained from previous cycles could be preserved as much as possible. To the best of our knowledge, this is the first work that applies restarting strategy to spectral clustering. The key difference is that we reclassify the samples in each cycle of our method, while they are classified only once in existing methods. To further release the overhead, we introduce a block diagonal representation with Nystr\"{o}m approximation for constructing the similarity matrix. Theoretical results are established to show the rationality of inexact computations in spectral clustering. Comprehensive experiments are performed on some benchmark databases, which show the superiority of our proposed algorithms over many state-of-the-art algorithms for large-scale problems. Specifically, our framework has a potential boost for clustering algorithms and works well even using an initial guess chosen randomly.
    Physics Informed Token Transformer. (arXiv:2305.08757v2 [cs.LG] UPDATED)
    Solving Partial Differential Equations (PDEs) is the core of many fields of science and engineering. While classical approaches are often prohibitively slow, machine learning models often fail to incorporate complete system information. Over the past few years, transformers have had a significant impact on the field of Artificial Intelligence and have seen increased usage in PDE applications. However, despite their success, transformers currently lack integration with physics and reasoning. This study aims to address this issue by introducing PITT: Physics Informed Token Transformer. The purpose of PITT is to incorporate the knowledge of physics by embedding partial differential equations (PDEs) into the learning process. PITT uses an equation tokenization method to learn an analytically-driven numerical update operator. By tokenizing PDEs and embedding partial derivatives, the transformer models become aware of the underlying knowledge behind physical processes. To demonstrate this, PITT is tested on challenging 1D and 2D PDE neural operator prediction tasks. The results show that PITT outperforms popular neural operator models and has the ability to extract physically relevant information from governing equations.
    Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait. (arXiv:2305.17595v2 [cs.LG] UPDATED)
    Hyperparameter (HP) optimization of deep learning (DL) is essential for high performance. As DL often requires several hours to days for its training, HP optimization (HPO) of DL is often prohibitively expensive. This boosted the emergence of tabular or surrogate benchmarks, which enable querying the (predictive) performance of DL with a specific HP configuration in a fraction. However, since the actual runtime of a DL training is significantly different from its query response time, simulators of an asynchronous HPO, e.g. multi-fidelity optimization, must wait for the actual runtime at each iteration in a na\"ive implementation; otherwise, the evaluation order during simulation does not match with the real experiment. To ease this issue, we developed a Python wrapper and describe its usage. This wrapper forces each worker to wait so that we yield exactly the same evaluation order as in the real experiment with only $10^{-2}$ seconds of waiting instead of waiting several hours. Our implementation is available at https://github.com/nabenabe0928/mfhpo-simulator/.
    Enhancing Adversarial Training via Reweighting Optimization Trajectory. (arXiv:2306.14275v2 [cs.LG] UPDATED)
    Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.
    Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models. (arXiv:2305.02279v3 [cs.LG] UPDATED)
    During the continuous evolution of one organism's ancestry, its genes accumulate extensive experiences and knowledge, enabling newborn descendants to rapidly adapt to their specific environments. Motivated by this observation, we propose a novel machine learning paradigm Learngene to enable learning models to incorporate three key characteristics of genes. (i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model. (ii) Condensing: the extensive accumulated knowledge is condensed into a much more compact information piece, i.e., learngene. (iii) Inheriting: the condensed learngene is inherited to make it easier for descendant models to adapt to new environments. Since accumulating has been studied in well-established paradigms like large-scale pre-training and lifelong learning, we focus on condensing and inheriting, which induces three key issues and we provide the preliminary solutions to these issues in this paper: (i) Learngene Form: the learngene is set to a few integral layers that can preserve significance. (ii) Learngene Condensing: we identify which layers among the ancestry model have the most similarity as one pseudo descendant model. (iii) Learngene Inheriting: to construct distinct descendant models for the specific downstream tasks, we stack some randomly initialized layers to the learngene layers. Extensive experiments across various settings, including using different network architectures like Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) on different datasets, are carried out to confirm four advantages of Learngene: it makes the descendant models 1) converge more quickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and 4) require fewer training samples to converge.
    Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach. (arXiv:2212.12674v2 [math.NA] UPDATED)
    A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
    KDEformer: Accelerating Transformers via Kernel Density Estimation. (arXiv:2302.02451v2 [cs.LG] UPDATED)
    Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former can be reduced to a variant of the kernel density estimation (KDE) problem, and an efficient KDE solver can be further utilized to accelerate the latter via subsampling-based fast matrix products. Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. Empirically, we verify that KDEformer outperforms other attention approximations in terms of accuracy, memory, and runtime on various pre-trained models. On BigGAN image generation, we achieve better generative scores than the exact computation with over $4\times$ speedup. For ImageNet classification with T2T-ViT, KDEformer shows over $18\times$ speedup while the accuracy drop is less than $0.5\%$.
    Restore Translation Using Equivariant Neural Networks. (arXiv:2306.16938v1 [cs.LG])
    Invariance to spatial transformations such as translations and rotations is a desirable property and a basic design principle for classification neural networks. However, the commonly used convolutional neural networks (CNNs) are actually very sensitive to even small translations. There exist vast works to achieve exact or approximate transformation invariance by designing transformation-invariant models or assessing the transformations. These works usually make changes to the standard CNNs and harm the performance on standard datasets. In this paper, rather than modifying the classifier, we propose a pre-classifier restorer to recover translated (or even rotated) inputs to the original ones which will be fed into any classifier for the same dataset. The restorer is based on a theoretical result which gives a sufficient and necessary condition for an affine operator to be translational equivariant on a tensor space.
    Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning. (arXiv:2303.01170v2 [cs.LG] UPDATED)
    Transfer learning in Reinforcement Learning (RL) has been widely studied to overcome training issues of Deep-RL, i.e., exploration cost, data availability and convergence time, by introducing a way to enhance training phase with external knowledge. Generally, knowledge is transferred from expert-agents to novices. While this fixes the issue for a novice agent, a good understanding of the task on expert agent is required for such transfer to be effective. As an alternative, in this paper we propose Expert-Free Online Transfer Learning (EF-OnTL), an algorithm that enables expert-free real-time dynamic transfer learning in multi-agent system. No dedicated expert exists, and transfer source agent and knowledge to be transferred are dynamically selected at each transfer step based on agents' performance and uncertainty. To improve uncertainty estimation, we also propose State Action Reward Next-State Random Network Distillation (sars-RND), an extension of RND that estimates uncertainty from RL agent-environment interaction. We demonstrate EF-OnTL effectiveness against a no-transfer scenario and advice-based baselines, with and without expert agents, in three benchmark tasks: Cart-Pole, a grid-based Multi-Team Predator-Prey (mt-pp) and Half Field Offense (HFO). Our results show that EF-OnTL achieve overall comparable performance when compared against advice-based baselines while not requiring any external input nor threshold tuning. EF-OnTL outperforms no-transfer with an improvement related to the complexity of the task addressed.
    PeakNet: An Autonomous Bragg Peak Finder with Deep Neural Networks. (arXiv:2303.15301v3 [physics.ins-det] UPDATED)
    Serial crystallography at X-ray free electron laser (XFEL) and synchrotron facilities has experienced tremendous progress in recent times enabling novel scientific investigations into macromolecular structures and molecular processes. However, these experiments generate a significant amount of data posing computational challenges in data reduction and real-time feedback. Bragg peak finding algorithm is used to identify useful images and also provide real-time feedback about hit-rate and resolution. Shot-to-shot intensity fluctuations and strong background scattering from buffer solution, injection nozzle and other shielding materials make this a time-consuming optimization problem. Here, we present PeakNet, an autonomous Bragg peak finder that utilizes deep neural networks. The development of this system 1) eliminates the need for manual algorithm parameter tuning, 2) reduces false-positive peaks by adjusting to shot-to-shot variations in strong background scattering in real-time, 3) eliminates the laborious task of manually creating bad pixel masks and the need to store these masks per event since these can be regenerated on demand. PeakNet also exhibits exceptional runtime efficiency, processing a 1920-by-1920 pixel image around 90 ms on an NVIDIA 1080 Ti GPU, with the potential for further enhancements through parallelized analysis or GPU stream processing. PeakNet is well-suited for expert-level real-time serial crystallography data analysis at high data rates.
    Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery. (arXiv:2306.17038v1 [cs.LG])
    Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions than conventional approaches, such as sparse symbolic regression over the complete possible terms library. The equation discovery field contains two independent directions. The first one is purely mathematical and concerns differentiation, the object of optimization and its relation to the functional spaces and others. The second one is dedicated purely to the optimizational problem statement. Both topics are worth investigating to improve the algorithm's ability to handle experimental data a more artificial intelligence way, without significant pre-processing and a priori knowledge of their nature. In the paper, we consider the prevalence of either single-objective optimization, which considers only the discrepancy between selected terms in the equation, or multi-objective optimization, which additionally takes into account the complexity of the obtained equation. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
    Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks. (arXiv:2305.09671v2 [cs.CR] UPDATED)
    Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning - a mechanism for backdooring models. A small number of poisoned samples seen during training can severely undermine a model's integrity during inference. Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack. We argue that this approach overlooks a crucial trade-off: Attackers can increase robustness at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning). In practice, attacks should remain both undetectable and robust. Detectable but robust attacks draw human attention and rigorous model evaluation or cause the model to be re-trained or discarded. In contrast, attacks that are undetectable but lack robustness can be repaired with minimal impact on model accuracy. Our research points to intrinsic flaws in current attack evaluation methods and raises the bar for all data poisoning attackers who must delicately balance this trade-off to remain robust and undetectable. To demonstrate the existence of more potent defenders, we propose defenses designed to (i) detect or (ii) repair poisoned models using a limited amount of trusted image-label pairs. Our results show that an attacker who needs to be robust and undetectable is substantially less threatening. Our defenses mitigate all tested attacks with a maximum accuracy decline of 2% using only 1% of clean data on CIFAR-10 and 2.5% on ImageNet. We demonstrate the scalability of our defenses by evaluating large vision-language models, such as CLIP. Attackers who can manipulate the model's parameters pose an elevated risk as they can achieve higher robustness at low detectability compared to data poisoning attackers.
    On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness. (arXiv:2210.10464v2 [cs.LG] UPDATED)
    Generalization in Reinforcement Learning (RL) aims to learn an agent during training that generalizes to the target environment. This paper studies RL generalization from a theoretical aspect: how much can we expect pre-training over training environments to be helpful? When the interaction with the target environment is not allowed, we certify that the best we can obtain is a near-optimal policy in an average sense, and we design an algorithm that achieves this goal. Furthermore, when the agent is allowed to interact with the target environment, we give a surprising result showing that asymptotically, the improvement from pre-training is at most a constant factor. On the other hand, in the non-asymptotic regime, we design an efficient algorithm and prove a distribution-based regret bound in the target environment that is independent of the state-action space.
    Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap. (arXiv:2302.12909v2 [cs.LG] UPDATED)
    We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(\epsilon,\delta)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2\epsilon^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $\tilde{O}(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $\alpha$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(\alpha+\frac{1}{\sqrt{n}})$. We show that this $\alpha$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $\Omega(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $\Delta$-stable algorithm has empirical gap $\Omega\big(\frac{1}{\Delta n}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.
    Structure from Voltage. (arXiv:2203.00063v2 [cs.LG] UPDATED)
    Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v5 [eess.IV] UPDATED)
    This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.
    Local Risk Bounds for Statistical Aggregation. (arXiv:2306.17151v1 [math.ST])
    In the problem of aggregation, the aim is to combine a given class of base predictors to achieve predictions nearly as accurate as the best one. In this flexible framework, no assumption is made on the structure of the class or the nature of the target. Aggregation has been studied in both sequential and statistical contexts. Despite some important differences between the two problems, the classical results in both cases feature the same global complexity measure. In this paper, we revisit and tighten classical results in the theory of aggregation in the statistical setting by replacing the global complexity with a smaller, local one. Some of our proofs build on the PAC-Bayes localization technique introduced by Catoni. Among other results, we prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator. These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu\'e and Rigollet for random design regression.
    Simulation of Human and Artificial Emotion (SHArE). (arXiv:2011.02151v2 [cs.AI] UPDATED)
    The framework for Simulation of Human and Artificial Emotion (SHArE) describes the architecture of emotion in terms of parameters transferable between psychology, neuroscience, and artificial intelligence. These parameters can be defined as abstract concepts or granularized down to the voltage levels of individual neurons. This model enables emotional trajectory design for humans which may lead to novel therapeutic solutions for various mental health concerns. For artificial intelligence, this work provides a compact notation which can be applied to neural networks as a means to observe the emotions and motivations of machines.
    Language Models as Knowledge Embeddings. (arXiv:2206.12617v3 [cs.CL] UPDATED)
    Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding entities and relations into continuous vector spaces. Existing methods are mainly structure-based or description-based. Structure-based methods learn representations that preserve the inherent structure of KGs. They cannot well represent abundant long-tail entities in real-world KGs with limited structural information. Description-based methods leverage textual information and language models. Prior approaches in this direction barely outperform structure-based ones, and suffer from problems like expensive negative sampling and restrictive description demand. In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods. We formulate description-based KE learning with a contrastive learning framework to improve efficiency in training and evaluation. Experimental results show that LMKE achieves state-of-the-art performance on KE benchmarks of link prediction and triple classification, especially for long-tail entities.
    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks. (arXiv:2208.12136v3 [cs.SE] UPDATED)
    Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.
    Continual Learning for Predictive Maintenance: Overview and Challenges. (arXiv:2301.12467v2 [cs.LG] UPDATED)
    Deep learning techniques have become one of the main propellers for solving engineering problems effectively and efficiently. For instance, Predictive Maintenance methods have been used to improve predictions of when maintenance is needed on different machines and operative contexts. However, deep learning methods are not without limitations, as these models are normally trained on a fixed distribution that only reflects the current state of the problem. Due to internal or external factors, the state of the problem can change, and the performance decreases due to the lack of generalization and adaptation. Contrary to this stationary training set, real-world applications change their environments constantly, creating the need to constantly adapt the model to evolving scenarios. To aid in this endeavor, Continual Learning methods propose ways to constantly adapt prediction models and incorporate new knowledge after deployment. Despite the advantages of these techniques, there are still challenges to applying them to real-world problems. In this work, we present a brief introduction to predictive maintenance, non-stationary environments, and continual learning, together with an extensive review of the current state of applying continual learning in real-world applications and specifically in predictive maintenance. We then discuss the current challenges of both predictive maintenance and continual learning, proposing future directions at the intersection of both areas. Finally, we propose a novel way to create benchmarks that favor the application of continuous learning methods in more realistic environments, giving specific examples of predictive maintenance.
    Transformers over Directed Acyclic Graphs. (arXiv:2210.13148v5 [cs.LG] UPDATED)
    Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v7 [cs.LG] UPDATED)
    Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.
    On the Predictive Accuracy of Neural Temporal Point Process Models for Continuous-time Event Data. (arXiv:2306.17066v1 [cs.LG])
    Temporal Point Processes (TPPs) serve as the standard mathematical framework for modeling asynchronous event sequences in continuous time. However, classical TPP models are often constrained by strong assumptions, limiting their ability to capture complex real-world event dynamics. To overcome this limitation, researchers have proposed Neural TPPs, which leverage neural network parametrizations to offer more flexible and efficient modeling. While recent studies demonstrate the effectiveness of Neural TPPs, they often lack a unified setup, relying on different baselines, datasets, and experimental configurations. This makes it challenging to identify the key factors driving improvements in predictive accuracy, hindering research progress. To bridge this gap, we present a comprehensive large-scale experimental study that systematically evaluates the predictive accuracy of state-of-the-art neural TPP models. Our study encompasses multiple real-world and synthetic event sequence datasets, following a carefully designed unified setup. We thoroughly investigate the influence of major architectural components such as event encoding, history encoder, and decoder parametrization on both time and mark prediction tasks. Additionally, we delve into the less explored area of probabilistic calibration for neural TPP models. By analyzing our results, we draw insightful conclusions regarding the significance of history size and the impact of architectural components on predictive accuracy. Furthermore, we shed light on the miscalibration of mark distributions in neural TPP models. Our study aims to provide valuable insights into the performance and characteristics of neural TPP models, contributing to a better understanding of their strengths and limitations.
    Joint Multi-view Unsupervised Feature Selection and Graph Learning. (arXiv:2204.08247v2 [cs.CV] UPDATED)
    Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or local structure learning, which lack the capability of graph learning with both global and local structural awareness. In light of this, this paper presents a joint multi-view unsupervised feature selection and graph learning (JMVFG) approach. Particularly, we formulate the multi-view feature selection with orthogonal decomposition, where each target matrix is decomposed into a view-specific basis matrix and a view-consistent cluster indicator. The cross-space locality preservation is incorporated to bridge the cluster structure learning in the projected space and the similarity learning (i.e., graph learning) in the original space. Further, a unified objective function is presented to enable the simultaneous learning of the cluster structure, the global and local similarity structures, and the multi-view consistency and inconsistency, upon which an alternating optimization algorithm is developed with theoretically proved convergence. Extensive experiments on a variety of real-world multi-view datasets demonstrate the superiority of our approach for both the multi-view feature selection and graph learning tasks. The code is available at https://github.com/huangdonghere/JMVFG.
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v3 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.
    Synthetic Demographic Data Generation for Card Fraud Detection Using GANs. (arXiv:2306.17109v1 [cs.LG])
    Using machine learning models to generate synthetic data has become common in many fields. Technology to generate synthetic transactions that can be used to detect fraud is also growing fast. Generally, this synthetic data contains only information about the transaction, such as the time, place, and amount of money. It does not usually contain the individual user's characteristics (age and gender are occasionally included). Using relatively complex synthetic demographic data may improve the complexity of transaction data features, thus improving the fraud detection performance. Benefiting from developments of machine learning, some deep learning models have potential to perform better than other well-established synthetic data generation methods, such as microsimulation. In this study, we built a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation. Our model generates samples during model training, which we found important to overcame class imbalance issues. This study can help improve the cognition of synthetic data and further explore the application of synthetic data generation in card fraud detection.
    Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal. (arXiv:2211.06277v2 [cs.CY] UPDATED)
    Explainable artificial intelligence (XAI) provides explanations for not interpretable machine learning (ML) models. While many technical approaches exist, there is a lack of validation of these techniques on real-world datasets. In this work, we present a use-case of XAI: an ML model which is trained to estimate electrification rates based on mobile phone data in Senegal. The data originate from the Data for Development challenge by Orange in 2014/15. We apply two model-agnostic, local explanation techniques and find that while the model can be verified, it is biased with respect to the population density. We conclude our paper by pointing to the two main challenges we encountered during our work: data processing and model design that might be restricted by currently available XAI methods, and the importance of domain knowledge to interpret explanations.
    Identifying Important Sensory Feedback for Learning Locomotion Skills. (arXiv:2306.17101v1 [cs.RO])
    Robot motor skills can be learned through deep reinforcement learning (DRL) by neural networks as state-action mappings. While the selection of state observations is crucial, there has been a lack of quantitative analysis to date. Here, we present a systematic saliency analysis that quantitatively evaluates the relative importance of different feedback states for motor skills learned through DRL. Our approach can identify the most essential feedback states for locomotion skills, including balance recovery, trotting, bounding, pacing and galloping. By using only key states including joint positions, gravity vector, base linear and angular velocities, we demonstrate that a simulated quadruped robot can achieve robust performance in various test scenarios across these distinct skills. The benchmarks using task performance metrics show that locomotion skills learned with key states can achieve comparable performance to those with all states, and the task performance or learning success rate will drop significantly if key states are missing. This work provides quantitative insights into the relationship between state observations and specific types of motor skills, serving as a guideline for robot motor learning. The proposed method is applicable to differentiable state-action mapping, such as neural network based control policies, enabling the learning of a wide range of motor skills with minimal sensing dependencies.
    MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier. (arXiv:2209.11549v2 [cs.CV] UPDATED)
    We offer a method for one-shot mask-guided image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled MAGIC, leverages structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guaranteeing credibility in the synthesis. Unlike current methods that use complex primitives to supervise the process or use attention maps as a weak supervisory signal, MAGIC aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior. MAGIC implements a series of manipulations with a single framework achieving shape and location control, intense non-rigid shape deformations, and copy/move operations in the presence of repeating objects and gives users firm control over the synthesis by requiring to simply specify binary guide masks. Our study and findings are supported by various qualitative comparisons with the state-of-the-art on the same images sampled from ImageNet and quantitative analysis using machine perception along with a user survey of 100+ participants that endorse our synthesis quality. Project page at https://mozhdehrouhsedaghat.github.io/magic.html. Code is available at https://github.com/mozhdehrouhsedaghat/magic
    Data-free Defense of Black Box Models Against Adversarial Attacks. (arXiv:2211.01579v2 [cs.LG] UPDATED)
    Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
    On-device modeling of user's social context and familiar places from smartphone-embedded sensor data. (arXiv:2205.08790v2 [cs.LG] UPDATED)
    Context modeling and recognition represent complex tasks that allow mobile and ubiquitous computing applications to adapt to the user's situation. Current solutions mainly focus on limited context information generally processed on centralized architectures, potentially exposing users' personal data to privacy leakage, and missing personalization features. For these reasons on-device context modeling and recognition represent the current research trend in this area. Among the different information characterizing the user's context in mobile environments, social interactions and visited locations remarkably contribute to the characterization of daily life scenarios. In this paper we propose a novel, unsupervised and lightweight approach to model the user's social context and her locations based on ego networks directly on the user mobile device. Relying on this model, the system is able to extract high-level and semantic-rich context features from smartphone-embedded sensors data. Specifically, for the social context it exploits data related to both physical and cyber social interactions among users and their devices. As far as location context is concerned, we assume that it is more relevant to model the familiarity degree of a specific location for the user's context than the raw location data, both in terms of GPS coordinates and proximity devices. By using 5 real-world datasets, we assess the structure of the social and location ego networks, we provide a semantic evaluation of the proposed models and a complexity evaluation in terms of mobile computing performance. Finally, we demonstrate the relevance of the extracted features by showing the performance of 3 machine learning algorithms to recognize daily-life situations, obtaining an improvement of 3% of AUROC, 9% of Precision, and 5% in terms of Recall with respect to use only features related to physical context.
    RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark. (arXiv:2306.17100v1 [cs.LG])
    We introduce RL4CO, an extensive reinforcement learning (RL) for combinatorial optimization (CO) benchmark. RL4CO employs state-of-the-art software libraries as well as best practices in implementation, such as modularity and configuration management, to be efficient and easily modifiable by researchers for adaptations of neural network architecture, environments, and algorithms. Contrary to the existing focus on specific tasks like the traveling salesman problem (TSP) for performance assessment, we underline the importance of scalability and generalization capabilities for diverse optimization tasks. We also systematically benchmark sample efficiency, zero-shot generalization, and adaptability to changes in data distributions of various models. Our experiments show that some recent state-of-the-art methods fall behind their predecessors when evaluated using these new metrics, suggesting the necessity for a more balanced view of the performance of neural CO solvers. We hope RL4CO will encourage the exploration of novel solutions to complex real-world tasks, allowing to compare with existing methods through a standardized interface that decouples the science from the software engineering. We make our library publicly available at https://github.com/kaist-silab/rl4co.
    Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. (arXiv:2306.17052v1 [cs.LG])
    Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-$\text{M}^3$-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-$\text{M}^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
    An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training. (arXiv:2306.17165v1 [cs.CV])
    We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Despite considerable progress in multi-task learning, most efforts focus on learning from multi-label data: a single image set with multiple task labels. Such multi-label data sets are rare, small, and expensive. We say heterogeneous to refer to image sets with different task labels, or to combinations of single-task datasets. Few have explored training on such heterogeneous datasets. General-purpose vision models are still dominated by single-task pretraining, and it remains unclear how to scale up multi-task models by leveraging mainstream vision datasets designed for different purposes. The challenges lie in managing large intrinsic differences among vision tasks, including data distribution, architectures, task-specific modules, dataset scales, and sampling strategies. To address these challenges, we propose to modify and scale up mixture-of-experts (MoE) vision transformers, so that they can simultaneously learn classification, detection, and segmentation on diverse mainstream vision datasets including ImageNet, COCO, and ADE20K. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks. Due to its emergent modularity, this general-purpose model decomposes into high-performing components, efficiently adapting to downstream tasks. We can fine-tune it with fewer training parameters, fewer model parameters, and less computation. Additionally, its modularity allows for easy expansion in continual-learning-without-forgetting scenarios. Finally, these functions can be controlled and combined to meet various demands of downstream tasks.
    Toward a Perspectivist Turn in Ground Truthing for Predictive Computing. (arXiv:2109.04270v3 [cs.LG] UPDATED)
    Most Artificial Intelligence applications are based on supervised machine learning (ML), which ultimately grounds on manually annotated data. The annotation process is often performed in terms of a majority vote and this has been proved to be often problematic, as highlighted by recent studies on the evaluation of ML models. In this article we describe and advocate for a different paradigm, which we call data perspectivism, which moves away from traditional gold standard datasets, towards the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of ML processes. Drawing on previous works which inspired our proposal we describe the potential of our proposal for not only the more subjective tasks (e.g. those related to human language) but also to tasks commonly understood as objective (e.g. medical decision making), and present the main advantages of adopting a perspectivist stance in ML, as well as possible disadvantages, and various ways in which such a stance can be implemented in practice. Finally, we share a set of recommendations and outline a research agenda to advance the perspectivist stance in ML.
    ManimML: Communicating Machine Learning Architectures with Animation. (arXiv:2306.17108v1 [cs.LG])
    There has been an explosion in interest in machine learning (ML) in recent years due to its applications to science and engineering. However, as ML techniques have advanced, tools for explaining and visualizing novel ML algorithms have lagged behind. Animation has been shown to be a powerful tool for making engaging visualizations of systems that dynamically change over time, which makes it well suited to the task of communicating ML algorithms. However, the current approach to animating ML algorithms is to handcraft applications that highlight specific algorithms or use complex generalized animation software. We developed ManimML, an open-source Python library for easily generating animations of ML algorithms directly from code. We sought to leverage ML practitioners' preexisting knowledge of programming rather than requiring them to learn complex animation software. ManimML has a familiar syntax for specifying neural networks that mimics popular deep learning frameworks like Pytorch. A user can take a preexisting neural network architecture and easily write a specification for an animation in ManimML, which will then automatically compose animations for different components of the system into a final animation of the entire neural network. ManimML is open source and available at https://github.com/helblazer811/ManimML.
    Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits. (arXiv:2306.16879v1 [cs.LG])
    Purpose: The development of machine learning models for surgical workflow and instrument recognition from temporal data represents a challenging task due to the complex nature of surgical workflows. In particular, the imbalanced distribution of data is one of the major challenges in the domain of surgical workflow recognition. In order to obtain meaningful results, careful partitioning of data into training, validation, and test sets, as well as the selection of suitable evaluation metrics are crucial. Methods: In this work, we present an openly available web-based application that enables interactive exploration of dataset partitions. The proposed visual framework facilitates the assessment of dataset splits for surgical workflow recognition, especially with regard to identifying sub-optimal dataset splits. Currently, it supports visualization of surgical phase and instrument annotations. Results: In order to validate the dedicated interactive visualizations, we use a dataset split of the Cholec80 dataset. This dataset split was specifically selected to reflect a case of strong data imbalance. Using our software, we were able to identify phases, phase transitions, and combinations of surgical instruments that were not represented in one of the sets. Conclusion: In order to obtain meaningful results in highly unbalanced class distributions, special care should be taken with respect to the selection of an appropriate split. Interactive data visualization represents a promising approach for the assessment of machine learning datasets. The source code is available at https://github.com/Cardio-AI/endovis-ml
    End-to-end Autonomous Driving: Challenges and Frontiers. (arXiv:2306.16927v1 [cs.RO])
    The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This field has flourished due to the availability of large-scale datasets, closed-loop evaluation, and the increasing need for autonomous driving algorithms to perform effectively in challenging scenarios. In this survey, we provide a comprehensive analysis of more than 250 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. Additionally, we discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework. To facilitate future research, we maintain an active repository that contains up-to-date links to relevant literature and open-source projects at https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving.
    The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks. (arXiv:2306.16922v1 [cs.NE])
    Traditional large-scale neuroscience models and machine learning utilize simplified models of individual neurons, relying on collective activity and properly adjusted connections to perform complex computations. However, each biological cortical neuron is inherently a sophisticated computational device, as corroborated in a recent study where it took a deep artificial neural network with millions of parameters to replicate the input-output relationship of a detailed biophysical model of a cortical pyramidal neuron. We question the necessity for these many parameters and introduce the Expressive Leaky Memory (ELM) neuron, a biologically inspired, computationally expressive, yet efficient model of a cortical neuron. Remarkably, our ELM neuron requires only 8K trainable parameters to match the aforementioned input-output relationship accurately. We find that an accurate model necessitates multiple memory-like hidden states and intricate nonlinear synaptic integration. To assess the computational ramifications of this design, we evaluate the ELM neuron on various tasks with demanding temporal structures, including a sequential version of the CIFAR-10 classification task, the challenging Pathfinder-X task, and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELM neuron outperforms most transformer-based models on the Pathfinder-X task with 77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, and superior performance compared to classic LSTM models on the variant of the Spiking Heidelberg Digits dataset. These findings indicate a potential for biologically motivated, computationally efficient neuronal models to enhance performance in challenging machine learning tasks.
    NAUTILUS: boosting Bayesian importance nested sampling with deep learning. (arXiv:2306.16923v1 [astro-ph.IM])
    We introduce a novel approach to boost the efficiency of the importance nested sampling (INS) technique for Bayesian posterior and evidence estimation using deep learning. Unlike rejection-based sampling methods such as vanilla nested sampling (NS) or Markov chain Monte Carlo (MCMC) algorithms, importance sampling techniques can use all likelihood evaluations for posterior and evidence estimation. However, for efficient importance sampling, one needs proposal distributions that closely mimic the posterior distributions. We show how to combine INS with deep learning via neural network regression to accomplish this task. We also introduce NAUTILUS, a reference open-source Python implementation of this technique for Bayesian posterior and evidence estimation. We compare NAUTILUS against popular NS and MCMC packages, including EMCEE, DYNESTY, ULTRANEST and POCOMC, on a variety of challenging synthetic problems and real-world applications in exoplanet detection, galaxy SED fitting and cosmology. In all applications, the sampling efficiency of NAUTILUS is substantially higher than that of all other samplers, often by more than an order of magnitude. Simultaneously, NAUTILUS delivers highly accurate results and needs fewer likelihood evaluations than all other samplers tested. We also show that NAUTILUS has good scaling with the dimensionality of the likelihood and is easily parallelizable to many CPUs.
    Macro Placement by Wire-Mask-Guided Black-Box Optimization. (arXiv:2306.16844v1 [cs.LG])
    The development of very large-scale integration (VLSI) technology has posed new challenges for electronic design automation (EDA) techniques in chip floorplanning. During this process, macro placement is an important subproblem, which tries to determine the positions of all macros with the aim of minimizing half-perimeter wirelength (HPWL) and avoiding overlapping. Previous methods include packing-based, analytical and reinforcement learning methods. In this paper, we propose a new black-box optimization (BBO) framework (called WireMask-BBO) for macro placement, by using a wire-mask-guided greedy procedure for objective evaluation. Equipped with different BBO algorithms, WireMask-BBO empirically achieves significant improvements over previous methods, i.e., achieves significantly shorter HPWL by using much less time. Furthermore, it can fine-tune existing placements by treating them as initial solutions, which can bring up to 50% improvement in HPWL. WireMask-BBO has the potential to significantly improve the quality and efficiency of chip floorplanning, which makes it appealing to researchers and practitioners in EDA and will also promote the application of BBO.
    Spectral Batch Normalization: Normalization in the Frequency Domain. (arXiv:2306.16999v1 [cs.CV])
    Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce spectral batch normalization (SBN), a novel effective method to improve generalization by normalizing feature maps in the frequency (spectral) domain. The activations of residual networks without batch normalization (BN) tend to explode exponentially in the depth of the network at initialization. This leads to extremely large feature map norms even though the parameters are relatively small. These explosive dynamics can be very detrimental to learning. BN makes weight decay regularization on the scaling factors $\gamma, \beta$ approximately equivalent to an additive penalty on the norm of the feature maps, which prevents extremely large feature map norms to a certain degree. However, we show experimentally that, despite the approximate additive penalty of BN, feature maps in deep neural networks (DNNs) tend to explode at the beginning of the network and that feature maps of DNNs contain large values during the whole training. This phenomenon also occurs in a weakened form in non-residual networks. SBN addresses large feature maps by normalizing them in the frequency domain. In our experiments, we empirically show that SBN prevents exploding feature maps at initialization and large feature map values during the training. Moreover, the normalization of feature maps in the frequency domain leads to more uniform distributed frequency components. This discourages the DNNs to rely on single frequency components of feature maps. These, together with other effects of SBN, have a regularizing effect on the training of residual and non-residual networks. We show experimentally that using SBN in addition to standard regularization methods improves the performance of DNNs by a relevant margin, e.g. ResNet50 on ImageNet by 0.71%.
    milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing. (arXiv:2306.17010v1 [cs.CV])
    Approaching the era of ubiquitous computing, human motion sensing plays a crucial role in smart systems for decision making, user interaction, and personalized services. Extensive research has been conducted on human tracking, pose estimation, gesture recognition, and activity recognition, which are predominantly based on cameras in traditional methods. However, the intrusive nature of cameras limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose \textit{milliFlow}, a novel deep learning method for scene flow estimation as a complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method with an average 3D endpoint error of 4.6cm, significantly surpassing the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition, human parsing, and human body part tracking. To foster further research in this area, we provide our codebase and dataset for open access.
    Graph Sampling-based Meta-Learning for Molecular Property Prediction. (arXiv:2306.16780v1 [cs.LG])
    Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https://github.com/HICAI-ZJU/GS-Meta.
    Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering. (arXiv:2306.16713v1 [cs.CV])
    We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question from these retrieved images. We refer to this problem as retrieval-based visual question answering (or RETVQA in short). The RETVQA is distinctively different and more challenging than the traditionally-studied Visual Question Answering (VQA), where a given question has to be answered with a single relevant image in context. Towards solving the RETVQA task, we propose a unified Multi Image BART (MI-BART) that takes a question and retrieved images using our relevance encoder for free-form fluent answer generation. Further, we introduce the largest dataset in this space, namely RETVQA, which has the following salient features: multi-image and retrieval requirement for VQA, metadata-independent questions over a pool of heterogeneous images, expecting a mix of classification-oriented and open-ended generative answers. Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available WebQA dataset on the accuracy and fluency metrics, respectively.
    CLIPAG: Towards Generator-Free Text-to-Image Generation. (arXiv:2306.16805v1 [cs.CV])
    Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play" manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators.
    Diffusion-Jump GNNs: Homophiliation via Learnable Metric Filters. (arXiv:2306.16976v1 [cs.LG])
    High-order Graph Neural Networks (HO-GNNs) have been developed to infer consistent latent spaces in the heterophilic regime, where the label distribution is not correlated with the graph structure. However, most of the existing HO-GNNs are hop-based, i.e., they rely on the powers of the transition matrix. As a result, these architectures are not fully reactive to the classification loss and the achieved structural filters have static supports. In other words, neither the filters' supports nor their coefficients can be learned with these networks. They are confined, instead, to learn combinations of filters. To address the above concerns, we propose Diffusion-jump GNNs a method relying on asymptotic diffusion distances that operates on jumps. A diffusion-pump generates pairwise distances whose projections determine both the support and coefficients of each structural filter. These filters are called jumps because they explore a wide range of scales in order to find bonds between scattered nodes with the same label. Actually, the full process is controlled by the classification loss. Both the jumps and the diffusion distances react to classification errors (i.e. they are learnable). Homophiliation, i.e., the process of learning piecewise smooth latent spaces in the heterophilic regime, is formulated as a Dirichlet problem: the known labels determine the border nodes and the diffusion-pump ensures a minimal deviation of the semi-supervised grouping from a canonical unsupervised grouping. This triggers the update of both the diffusion distances and, consequently, the jumps in order to minimize the classification error. The Dirichlet formulation has several advantages. It leads to the definition of structural heterophily, a novel measure beyond edge heterophily. It also allows us to investigate links with (learnable) diffusion distances, absorbing random walks and stochastic diffusion.
    Weight Compander: A Simple Weight Reparameterization for Regularization. (arXiv:2306.16993v1 [cs.LG])
    Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing each weight in deep neural networks using a nonlinear function. It is a general, intuitive, cheap and easy to implement method, which can be combined with various other regularization techniques. Large weights in deep neural networks are a sign of a more complex network that is overfitted to the training data. Moreover, regularized networks tend to have a greater range of weights around zero with fewer weights centered at zero. We introduce a weight reparameterization function which is applied to each weight and implicitly reduces overfitting by restricting the magnitude of the weights while forcing them away from zero at the same time. This leads to a more democratic decision-making in the network. Firstly, individual weights cannot have too much influence in the prediction process due to the restriction of their magnitude. Secondly, more weights are used in the prediction process, since they are forced away from zero during the training. This promotes the extraction of more features from the input data and increases the level of weight redundancy, which makes the network less sensitive to statistical differences between training and test data. We extend our method to learn the hyperparameters of the introduced weight reparameterization function. This avoids hyperparameter search and gives the network the opportunity to align the weight reparameterization with the training progress. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
    Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning. (arXiv:2306.17033v1 [cs.LG])
    Compositionality is a critical aspect of scalable system design. Reinforcement learning (RL) has recently shown substantial success in task learning, but has only recently begun to truly leverage composition. In this paper, we focus on Boolean composition of learned tasks as opposed to functional or sequential composition. Existing Boolean composition for RL focuses on reaching a satisfying absorbing state in environments with discrete action spaces, but does not support composable safety (i.e., avoidance) constraints. We advance the state of the art in Boolean composition of learned tasks with three contributions: i) introduce two distinct notions of safety in this framework; ii) show how to enforce either safety semantics, prove correctness (under some assumptions), and analyze the trade-offs between the two safety notions; and iii) extend Boolean composition from discrete action spaces to continuous action spaces. We demonstrate these techniques using modified versions of value iteration in a grid world, Deep Q-Network (DQN) in a grid world with image observations, and Twin Delayed DDPG (TD3) in a continuous-observation and continuous-action Bullet physics environment. We believe that these contributions advance the theory of safe reinforcement learning by allowing zero-shot composition of policies satisfying safety properties.
    Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective. (arXiv:2306.16893v1 [cs.LG])
    Feature transformation aims to reconstruct an effective representation space by mathematically refining the existing features. It serves as a pivotal approach to combat the curse of dimensionality, enhance model generalization, mitigate data sparsity, and extend the applicability of classical models. Existing research predominantly focuses on domain knowledge-based feature engineering or learning latent representations. However, these methods, while insightful, lack full automation and fail to yield a traceable and optimal representation space. An indispensable question arises: Can we concurrently address these limitations when reconstructing a feature space for a machine-learning task? Our initial work took a pioneering step towards this challenge by introducing a novel self-optimizing framework. This framework leverages the power of three cascading reinforced agents to automatically select candidate features and operations for generating improved feature transformation combinations. Despite the impressive strides made, there was room for enhancing its effectiveness and generalization capability. In this extended journal version, we advance our initial work from two distinct yet interconnected perspectives: 1) We propose a refinement of the original framework, which integrates a graph-based state representation method to capture the feature interactions more effectively and develop different Q-learning strategies to alleviate Q-value overestimation further. 2) We utilize a new optimization technique (actor-critic) to train the entire self-optimizing framework in order to accelerate the model convergence and improve the feature transformation performance. Finally, to validate the improved effectiveness and generalization capability of our framework, we perform extensive experiments and conduct comprehensive analyses.
    Gesture Recognition with mmWave Wi-Fi Access Points: Lessons Learned. (arXiv:2306.17062v1 [cs.LG])
    In recent years, channel state information (CSI) at sub-6 GHz has been widely exploited for Wi-Fi sensing, particularly for activity and gesture recognition. In this work, we instead explore mmWave (60 GHz) Wi-Fi signals for gesture recognition/pose estimation. Our focus is on the mmWave Wi-Fi signals so that they can be used not only for high data rate communication but also for improved sensing e.g., for extended reality (XR) applications. For this reason, we extract spatial beam signal-to-noise ratios (SNRs) from the periodic beam training employed by IEEE 802.11ad devices. We consider a set of 10 gestures/poses motivated by XR applications. We conduct experiments in two environments and with three people.As a comparison, we also collect CSI from IEEE 802.11ac devices. To extract features from the CSI and the beam SNR, we leverage a deep neural network (DNN). The DNN classifier achieves promising results on the beam SNR task with state-of-the-art 96.7% accuracy in a single environment, even with a limited dataset. We also investigate the robustness of the beam SNR against CSI across different environments. Our experiments reveal that features from the CSI generalize without additional re-training, while those from beam SNRs do not. Therefore, re-training is required in the latter case.
    SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores. (arXiv:2306.16688v1 [cs.DC])
    The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed RL system to efficiently generate and process a massive amount of data to train intelligent agents. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. While industrial systems from OpenAI and DeepMind have achieved successful large-scale RL training, their system architecture and implementation details remain undisclosed to the community. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies practical RL training across diverse applications into a general framework and enables fine-grained optimizations. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLly Scalable RL (SRL). The system architecture of SRL separates major RL computation components and allows massively parallelized training. Moreover, SRL offers user-friendly and extensible interfaces for customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries in both a single machine and a medium-sized cluster. In a large-scale cluster, the novel architecture of SRL leads to up to 3.7x speedup compared to the design choices adopted by the existing libraries. We also conduct a direct benchmark comparison to OpenAI's industrial system, Rapid, in the challenging hide-and-seek environment. SRL reproduces the same solution as reported by OpenAI with up to 5x speedup in wall-clock time. Furthermore, we also examine the performance of SRL in a much harder variant of the hide-and-seek environment and achieve substantial learning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs. Notably, SRL is the first in the academic community to perform RL experiments at such a large scale.
    Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs. (arXiv:2306.16921v1 [cs.LG])
    Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. Some recent theoretical results also showed that changing the sampling distribution can help neural networks learn parities, with formal results only for large learning rates and one-step arguments. Here we show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution: if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.
    Joint Level Generation and Translation Using Gameplay Videos. (arXiv:2306.16662v1 [cs.CV])
    Procedural Content Generation via Machine Learning (PCGML) faces a significant hurdle that sets it apart from other fields, such as image or text generation, which is limited annotated data. Many existing methods for procedural level generation via machine learning require a secondary representation besides level images. However, the current methods for obtaining such representations are laborious and time-consuming, which contributes to this problem. In this work, we aim to address this problem by utilizing gameplay videos of two human-annotated games to develop a novel multi-tail framework that learns to perform simultaneous level translation and generation. The translation tail of our framework can convert gameplay video frames to an equivalent secondary representation, while its generation tail can produce novel level segments. Evaluation results and comparisons between our framework and baselines suggest that combining the level generation and translation tasks can lead to an overall improved performance regarding both tasks. This represents a possible solution to limited annotated level data, and we demonstrate the potential for future versions to generalize to unseen games.
    AutoML in Heavily Constrained Applications. (arXiv:2306.16913v1 [cs.LG])
    Optimizing a machine learning pipeline for a task at hand requires careful configuration of various hyperparameters, typically supported by an AutoML system that optimizes the hyperparameters for the given training dataset. Yet, depending on the AutoML system's own second-order meta-configuration, the performance of the AutoML process can vary significantly. Current AutoML systems cannot automatically adapt their own configuration to a specific use case. Further, they cannot compile user-defined application constraints on the effectiveness and efficiency of the pipeline and its generation. In this paper, we propose Caml, which uses meta-learning to automatically adapt its own AutoML parameters, such as the search strategy, the validation strategy, and the search space, for a task at hand. The dynamic AutoML strategy of Caml takes user-defined constraints into account and obtains constraint-satisfying pipelines with high predictive performance.
    The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes. (arXiv:2306.16917v1 [cs.CV])
    Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data. Dataset and code: https://davidrecasens.github.io/TheDrunkard'sOdometry/
    Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula. (arXiv:2306.16951v1 [math.AT])
    Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. Specifically, in the simplicial group setting of Wu's formula, we reformulate the problem of generating simplicial cycles as a problem of sampling from the intersection of algorithmic datasets related to Dyck languages. We present and evaluate language modelling approaches that employ multi-label information for input sequences, along with the necessary group-theoretic toolkit and non-neural baselines.
    Policy Space Diversity for Non-Transitive Games. (arXiv:2306.16884v1 [cs.GT])
    Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.
    Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification. (arXiv:2306.16760v1 [cs.SD])
    We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition, focused on identifying African bird species in recorded soundscapes. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition. We explore the embedding space learned by BirdNET and propose a process to derive an annotated dataset for supervised learning. Our experiments involve various models and feature engineering approaches to maximize performance on the competition leaderboard. The results demonstrate the effectiveness of our approach in classifying bird species and highlight the potential of transfer learning and semi-supervised dataset annotation in similar tasks.
    Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging. (arXiv:2306.16788v1 [cs.LG])
    Neural networks can be significantly compressed by pruning, leading to sparse models requiring considerably less storage and floating-point operations while maintaining predictive performance. Model soups (Wortsman et al., 2022) improve generalization and out-of-distribution performance by averaging the parameters of multiple models into a single one without increased inference time. However, identifying models in the same loss basin to leverage both sparsity and parameter averaging is challenging, as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. In this work, we address these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varying hyperparameter configurations, such as batch ordering or weight decay, produces models that are suitable for averaging and share the same sparse connectivity by design. Averaging these models significantly enhances generalization performance compared to their individual components. Building on this idea, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model of the previous phase. SMS maintains sparsity, exploits sparse network benefits being modular and fully parallelizable, and substantially improves IMP's performance. Additionally, we demonstrate that SMS can be adapted to enhance the performance of state-of-the-art pruning during training approaches.
    Query-based Hard-Image Retrieval for Object Detection at Test Time. (arXiv:2209.11559v2 [cs.CV] UPDATED)
    There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will generally require closer inspection than a missed detection of a car in the distance. The problem of predicting such potential failures at test time has largely been overlooked in the literature and conventional approaches based on detection uncertainty fall short in that they are agnostic to such fine-grained characterisation of errors. In this work, we propose to reformulate the problem of finding "hard" images as a query-based hard image retrieval task, where queries are specific definitions of "hardness", and offer a simple and intuitive method that can solve this task for a large family of queries. Our method is entirely post-hoc, does not require ground-truth annotations, is independent of the choice of a detector, and relies on an efficient Monte Carlo estimation that uses a simple stochastic model in place of the ground-truth. We show experimentally that it can be applied successfully to a wide variety of queries for which it can reliably identify hard images for a given detector without any labelled data. We provide results on ranking and classification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN, and Cascade Mask-RCNN object detectors. The code for this project is available at https://github.com/fiveai/hardest.
    Improving Online Continual Learning Performance and Stability with Temporal Ensembles. (arXiv:2306.16817v1 [cs.LG])
    Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.1345(2) showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature.
    Private Covariance Approximation and Eigenvalue-Gap Bounds for Complex Gaussian Perturbations. (arXiv:2306.16648v1 [cs.DS])
    We consider the problem of approximating a $d \times d$ covariance matrix $M$ with a rank-$k$ matrix under $(\varepsilon,\delta)$-differential privacy. We present and analyze a complex variant of the Gaussian mechanism and show that the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$ is bounded by roughly $\tilde{O}(\sqrt{kd})$, whenever there is an appropriately large gap between the $k$'th and the $k+1$'th eigenvalues of $M$. This improves on previous work that requires that the gap between every pair of top-$k$ eigenvalues of $M$ is at least $\sqrt{d}$ for a similar bound. Our analysis leverages the fact that the eigenvalues of complex matrix Brownian motion repel more than in the real case, and uses Dyson's stochastic differential equations governing the evolution of its eigenvalues to show that the eigenvalues of the matrix $M$ perturbed by complex Gaussian noise have large gaps with high probability. Our results contribute to the analysis of low-rank approximations under average-case perturbations and to an understanding of eigenvalue gaps for random matrices, which may be of independent interest.
    Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning. (arXiv:2306.16750v1 [cs.LG])
    We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.
    Length of Stay prediction for Hospital Management using Domain Adaptation. (arXiv:2306.16823v1 [cs.LG])
    Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems in charge of effective hospital planning. Therefore, the design of the prediction system should be adapted to work in a true hospital setting. In this study, we predict early hospital LoS at the granular level of admission units by applying domain adaptation to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient stays to 8 and 9 intensive care units were respectively extracted from eICU-CRD and MIMIC-IV. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, the weights of which were transferred either partially or fully to initiate training in target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the benchmark, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some target domains. The proposed method thus provides an adapted clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time.
    An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs. (arXiv:2306.16601v1 [cs.LG])
    In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To mitigate the gap, model compression techniques such as structured pruning are being used to improve inference efficiency. However, most existing neural network inference runtimes lack adequate support for structured sparsity. In this paper, we propose an efficient sparse deep learning inference software stack for Transformer-based language models where the weights are pruned with constant block size. Our sparse software accelerator leverages Intel Deep Learning Boost to maximize the performance of sparse matrix - dense matrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kernel outperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by an order of magnitude on a wide range of GEMM shapes under 5 representative sparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows up to 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense library widely used in industry. We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large. Our sparse inference software shows up to 1.5x speedup over Neural Magic's Deepsparse under same configurations on Xeon on Amazon Web Services under proxy production latency constraints. We also compare our solution with two framework-based inference solutions, ONNX Runtime and PyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x over PyTorch on Xeon under the latency constraints. All the source code is publicly available on Github: https://github.com/intel/intel-extension-for-transformers.
    Learning from Synthetic Human Group Activities. (arXiv:2306.16772v1 [cs.CV])
    The understanding of complex human interactions and group activities has garnered attention in human-centric computer vision. However, the advancement of the related tasks is hindered due to the difficulty of obtaining large-scale labeled real-world datasets. To mitigate the issue, we propose M3Act, a multi-view multi-group multi-person human atomic action and group activity data generator. Powered by the Unity engine, M3Act contains simulation-ready 3D scenes and human assets, configurable lighting and camera systems, highly parameterized modular group activities, and a large degree of domain randomization during the data generation process. Our data generator is capable of generating large-scale datasets of human activities with multiple viewpoints, modalities (RGB images, 2D poses, 3D motions), and high-quality annotations for individual persons and multi-person groups (2D bounding boxes, instance segmentation masks, individual actions and group activity categories). Using M3Act, we perform synthetic data pre-training for 2D skeleton-based group activity recognition and RGB-based multi-person pose tracking. The results indicate that learning from our synthetic datasets largely improves the model performances on real-world datasets, with the highest gain of 5.59% and 7.32% respectively in group and person recognition accuracy on CAD2, as well as an improvement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data also leads to faster model convergence on downstream tasks (up to 6.8% faster). Moreover, M3Act opens new research problems for 3D group activity generation. We release M3Act3D, an 87.6-hour 3D motion dataset of human activities with larger group sizes and higher complexity of inter-person interactions than previous multi-person datasets. We define multiple metrics and propose a competitive baseline for the novel task.
    OSP: Boosting Distributed Model Training with 2-stage Synchronization. (arXiv:2306.16926v1 [cs.DC])
    Distributed deep learning (DDL) is a promising research area, which aims to increase the efficiency of training deep learning tasks with large size of datasets and models. As the computation capability of DDL nodes continues to increase, the network connection between nodes is becoming a major bottleneck. Various methods of gradient compression and improved model synchronization have been proposed to address this bottleneck in Parameter-Server-based DDL. However, these two types of methods can result in accuracy loss due to discarded gradients and have limited enhancement on the throughput of model synchronization, respectively. To address these challenges, we propose a new model synchronization method named Overlapped Synchronization Parallel (OSP), which achieves efficient communication with a 2-stage synchronization approach and uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy loss caused by stale parameters. The prototype of OSP has been implemented using PyTorch and evaluated on commonly used deep learning models and datasets with a 9-node testbed. Evaluation results show that OSP can achieve up to 50\% improvement in throughput without accuracy loss compared to popular synchronization models.  ( 2 min )
    SaGess: Sampling Graph Denoising Diffusion Model for Scalable Graph Generation. (arXiv:2306.16827v1 [cs.LG])
    Over recent years, denoising diffusion generative models have come to be considered as state-of-the-art methods for synthetic data generation, especially in the case of generating images. These approaches have also proved successful in other applications such as tabular and graph data generation. However, due to computational complexity, to this date, the application of these techniques to graph data has been restricted to small graphs, such as those used in molecular modeling. In this paper, we propose SaGess, a discrete denoising diffusion approach, which is able to generate large real-world networks by augmenting a diffusion model (DiGress) with a generalized divide-and-conquer framework. The algorithm is capable of generating larger graphs by sampling a covering of subgraphs of the initial graph in order to train DiGress. SaGess then constructs a synthetic graph using the subgraphs that have been generated by DiGress. We evaluate the quality of the synthetic data sets against several competitor methods by comparing graph statistics between the original and synthetic samples, as well as evaluating the utility of the synthetic data set produced by using it to train a task-driven model, namely link prediction. In our experiments, SaGess, outperforms most of the one-shot state-of-the-art graph generating methods by a significant factor, both on the graph metrics and on the link prediction task.
    Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach. (arXiv:2306.16906v1 [stat.ML])
    Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde
    An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning. (arXiv:2306.12088v3 [cs.LG] UPDATED)
    Communication overhead is one of the major challenges in Federated Learning(FL). A few classical schemes assume the server can extract the auxiliary information about training data of the participants from the local models to construct a central dummy dataset. The server uses the dummy dataset to finetune aggregated global model to achieve the target test accuracy in fewer communication rounds. In this paper, we summarize the above solutions into a data-based communication-efficient FL framework. The key of the proposed framework is to design an efficient extraction module(EM) which ensures the dummy dataset has a positive effect on finetuning aggregated global model. Different from the existing methods that use generator to design EM, our proposed method, FedINIBoost borrows the idea of gradient match to construct EM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in two steps for each participant at each communication round. Then the server aggregates all the proxy datasets to form a central dummy dataset, which is used to finetune aggregated global model. Extensive experiments verify the superiority of our method compared with the existing classical method, FedAVG, FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role in finetuning the performance of aggregated global model at the initial stage of FL.
    Group-based Robustness: A General Framework for Customized Robustness in the Real World. (arXiv:2306.16614v1 [cs.LG])
    Machine-learning models are known to be vulnerable to evasion attacks that perturb model inputs to induce misclassifications. In this work, we identify real-world scenarios where the true threat cannot be assessed accurately by existing attacks. Specifically, we find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. To address the shortcomings of existing methods, we formally define a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show empirically that group-based robustness allows us to distinguish between models' vulnerability against specific threat models in situations where traditional robustness metrics do not apply. Moreover, to measure group-based robustness efficiently and accurately, we 1) propose two loss functions and 2) identify three new attack strategies. We show empirically that with comparable success rates, finding evasive samples using our new loss functions saves computation by a factor as large as the number of targeted classes, and finding evasive samples using our new attack strategies saves time by up to 99\% compared to brute-force search methods. Finally, we propose a defense method that increases group-based robustness by up to 3.52$\times$.
    Sampling weights of deep neural networks. (arXiv:2306.16830v1 [cs.LG])
    We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks. In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed to obtain a trained network. The sampling is based on the idea of random feature models. However, instead of a data-agnostic distribution, e.g., a normal distribution, we use both the input and the output training data of the supervised learning problem to sample both shallow and deep networks. We prove that the sampled networks we construct are universal approximators. We also show that our sampling scheme is invariant to rigid body transformations and scaling of the input data. This implies many popular pre-processing techniques are no longer required. For Barron functions, we show that the $L^2$-approximation error of sampled shallow networks decreases with the square root of the number of neurons. In numerical experiments, we demonstrate that sampled networks achieve comparable accuracy as iteratively trained ones, but can be constructed orders of magnitude faster. Our test cases involve a classification benchmark from OpenML, sampling of neural operators to represent maps in function spaces, and transfer learning using well-known architectures.
    Game Level Blending using a Learned Level Representation. (arXiv:2306.16666v1 [cs.LG])
    Game level blending via machine learning, the process of combining features of game levels to create unique and novel game levels using Procedural Content Generation via Machine Learning (PCGML) techniques, has gained increasing popularity in recent years. However, many existing techniques rely on human-annotated level representations, which limits game level blending to a limited number of annotated games. Even with annotated games, researchers often need to author an additional shared representation to make blending possible. In this paper, we present a novel approach to game level blending that employs Clustering-based Tile Embeddings (CTE), a learned level representation technique that can serve as a level representation for unannotated games and a unified level representation across games without the need for human annotation. CTE represents game level tiles as a continuous vector representation, unifying their visual, contextual, and behavioral information. We apply this approach to two classic Nintendo games, Lode Runner and The Legend of Zelda. We run an evaluation comparing the CTE representation to a common, human-annotated representation in the blending task and find that CTE has comparable or better performance without the need for human annotation.
    ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch. (arXiv:2306.16857v1 [cs.RO])
    We present ArrayBot, a distributed manipulation system consisting of a $16 \times 16$ array of vertically sliding pillars integrated with tactile sensors, which can simultaneously support, perceive, and manipulate the tabletop objects. Towards generalizable distributed manipulation, we leverage reinforcement learning (RL) algorithms for the automatic discovery of control policies. In the face of the massively redundant actions, we propose to reshape the action space by considering the spatially local action patch and the low-frequency actions in the frequency domain. With this reshaped action space, we train RL agents that can relocate diverse objects through tactile observations only. Surprisingly, we find that the discovered policy can not only generalize to unseen object shapes in the simulator but also transfer to the physical robot without any domain randomization. Leveraging the deployed policy, we present abundant real-world manipulation tasks, illustrating the vast potential of RL on ArrayBot for distributed manipulation.
    Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis. (arXiv:2306.16803v1 [cs.LG])
    To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: "Would the agent still have reached this reward if it had taken another action?". We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects, resulting in gradient estimates with lower variance. We run experiments on a suite of problems specifically designed to evaluate long-term credit assignment capabilities. By using dynamic programming, we measure ground-truth policy gradients and show that the improved performance of our new model-based credit assignment methods is due to lower bias and variance compared to HCA and common baselines. Our results demonstrate how modeling action contributions towards rewarding outcomes can be leveraged for credit assignment, opening a new path towards sample-efficient reinforcement learning.
    Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation. (arXiv:2306.16699v1 [cs.CV])
    Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Instead, INR represents objects as continuous functions. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. However, INR holds potential for various applications beyond image compression. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks. Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training. Additionally, the decoding process from INR to RGB format is highly parallelized and executed on-the-fly. To further enhance compression, we propose iterative and dynamic pruning, as well as layer-wise quantization, building upon previous work. We evaluate our framework on the image classification task, utilizing the ResNet-18 backbone network and three commonly used datasets with varying image sizes. Rapid-INR reduces memory consumption to only 5% of the original dataset size and achieves a maximum 6$\times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2x speedup over the DALI training pipeline, with only a marginal decrease in accuracy. Importantly, Rapid-INR can be readily applied to other computer vision tasks and backbone networks with reasonable engineering efforts. Our implementation code is publicly available at https://anonymous.4open.science/r/INR-4BF7.
    "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks. (arXiv:2204.04636v2 [cs.AI] UPDATED)
    Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work presents a model-agnostic detector of adversarial text examples. The approach identifies patterns in the logits of the target classifier when perturbing the input text. The proposed detector improves the current state-of-the-art performance in recognizing adversarial inputs and exhibits strong generalization capabilities across different NLP models, datasets, and word-level attacks.
    CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems. (arXiv:2105.14216v4 [cs.LG] UPDATED)
    Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training. Recently, algorithms for minimax problems in the Federated Learning (FL) paradigm have received considerable interest. Existing federated algorithms for general minimax problems require the full aggregation (i.e., aggregation of local model information from all clients) in each training round. Thus, they are inapplicable to an important setting of FL known as the cross-device setting, which involves numerous unreliable mobile/IoT devices. In this paper, we develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting. CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round. With this mechanism, CDMA is resilient to the low client availability. In addition, CDMA is incorporated with a lightweight global correction in the local update steps of clients, which mitigates the impact of slow network connections. We establish theoretical guarantees of CDMA under different choices of hyperparameters and conduct experiments on AUC maximization, robust adversarial network training, and GAN training tasks. Theoretical and experimental results demonstrate the efficiency of CDMA.
    End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments. (arXiv:2306.16978v1 [cs.RO])
    Coverage path planning is the problem of finding the shortest path that covers the entire free space of a given confined area, with applications ranging from robotic lawn mowing and vacuum cleaning, to demining and search-and-rescue tasks. While offline methods can find provably complete, and in some cases optimal, paths for known environments, their value is limited in online scenarios where the environment is not known beforehand, especially in the presence of non-static obstacles. We propose an end-to-end reinforcement learning-based approach in continuous state and action space, for the online coverage path planning problem that can handle unknown environments. We construct the observation space from both global maps and local sensory inputs, allowing the agent to plan a long-term path, and simultaneously act on short-term obstacle detections. To account for large-scale environments, we propose to use a multi-scale map input representation. Furthermore, we propose a novel total variation reward term for eliminating thin strips of uncovered space in the learned path. To validate the effectiveness of our approach, we perform extensive experiments in simulation with a distance sensor, surpassing the performance of a recent reinforcement learning-based approach.
    Did AI get more negative recently?. (arXiv:2202.13610v3 [cs.CL] UPDATED)
    In this paper, we classify scientific articles in the domain of natural language processing (NLP) and machine learning (ML), as core subfields of artificial intelligence (AI), into whether (i) they extend the current state-of-the-art by the introduction of novel techniques which beat existing models or whether (ii) they mainly criticize the existing state-of-the-art, i.e. that it is deficient with respect to some property (e.g. wrong evaluation, wrong datasets, misleading task specification). We refer to contributions under (i) as having a 'positive stance' and contributions under (ii) as having a 'negative stance' (to related work). We annotate over 1.5 k papers from NLP and ML to train a SciBERT-based model to automatically predict the stance of a paper based on its title and abstract. We then analyse large-scale trends on over 41 k papers from the last approximately 35 years in NLP and ML, finding that papers have become substantially more positive over time, but negative papers also got more negative and we observe considerably more negative papers in recent years. Negative papers are also more influential in terms of citations they receive.
    On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth. (arXiv:2306.16854v1 [cs.LG])
    We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the context of RNNs that were trained to recognize regular languages. This enables us to draw on perfect ground-truth automata in our evaluation, against which we can compare the RNN's accuracy and the distribution of the hidden-state vectors. We start with examining the (piecewise linear) separability of an RNN's hidden-state vectors into semantically different classes. We continue the analysis by computing clusters over the hidden-state vector space with multiple state-of-the-art unsupervised clustering approaches. We formally analyze the accuracy of computed clustering functions and the validity of the clustering hypothesis by determining whether clusters group semantically similar vectors to the same state in the ground-truth model. Our evaluation supports the validity of the clustering hypothesis in the majority of examined cases. We observed that the hidden-state vectors of well-trained RNNs are separable, and that the unsupervised clustering techniques succeed in finding clusters of similar state vectors.
    Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations. (arXiv:2306.17105v1 [cs.LG])
    Recent work has observed an intriguing ''Neural Collapse'' phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. Specifically, even when representations apparently collapse, the small amount of remaining variation can still faithfully and accurately captures the intrinsic structure of input distribution. As an example, if we train on CIFAR-10 using only 5 coarse-grained labels (by combining two classes into one super-class) until convergence, we can reconstruct the original 10-class labels from the learned representations via unsupervised clustering. The reconstructed labels achieve $93\%$ accuracy on the CIFAR-10 test set, nearly matching the normal CIFAR-10 accuracy for the same architecture. We also provide an initial theoretical result showing the fine-grained representation structure in a simplified synthetic setting. Our results show concretely how the structure of input data can play a significant role in determining the fine-grained structure of neural representations, going beyond what Neural Collapse predicts.  ( 2 min )
    The Importance of Robust Features in Mitigating Catastrophic Forgetting. (arXiv:2306.17091v1 [cs.CV])
    Continual learning (CL) is an approach to address catastrophic forgetting, which refers to forgetting previously learned knowledge by neural networks when trained on new tasks or data distributions. The adversarial robustness has decomposed features into robust and non-robust types and demonstrated that models trained on robust features significantly enhance adversarial robustness. However, no study has been conducted on the efficacy of robust features from the lens of the CL model in mitigating catastrophic forgetting in CL. In this paper, we introduce the CL robust dataset and train four baseline models on both the standard and CL robust datasets. Our results demonstrate that the CL models trained on the CL robust dataset experienced less catastrophic forgetting of the previously learned tasks than when trained on the standard dataset. Our observations highlight the significance of the features provided to the underlying CL models, showing that CL robust features can alleviate catastrophic forgetting.
    Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation. (arXiv:2306.16916v1 [cs.LG])
    We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is that each task is most correlated to those immediately before it. This matches many deployed settings, where hyperparameters are retuned as more data is collected; for instance tuning a sequence of movie recommendation systems as more movies and ratings are added. We propose a formal definition, outline the differences to related problems and propose a basic OTHPO method that outperforms state-of-the-art transfer HPO. We empirically show the importance of taking order into account using ten benchmarks. The benchmarks are in the setting of gradually accumulating data, and span XGBoost, random forest, approximate k-nearest neighbor, elastic net, support vector machines and a separate real-world motivated optimisation problem. We open source the benchmarks to foster future research on ordered transfer HPO.  ( 2 min )
    Principles and Guidelines for Evaluating Social Robot Navigation Algorithms. (arXiv:2306.16740v1 [cs.RO])
    A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.
    Understanding the Overfitting of the Episodic Meta-training. (arXiv:2306.16873v1 [cs.LG])
    Despite the success of two-stage few-shot classification methods, in the episodic meta-training stage, the model suffers severe overfitting. We hypothesize that it is caused by over-discrimination, i.e., the model learns to over-rely on the superficial features that fit for base class discrimination while suppressing the novel class generalization. To penalize over-discrimination, we introduce knowledge distillation techniques to keep novel generalization knowledge from the teacher model during training. Specifically, we select the teacher model as the one with the best validation accuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL) divergence between the output distribution of the linear classifier of the teacher model and that of the student model. This simple approach outperforms the standard meta-training process. We further propose the Nearest Neighbor Symmetric Kullback-Leibler (NNSKL) divergence for meta-training to push the limits of knowledge distillation techniques. NNSKL takes few-shot tasks as input and penalizes the output of the nearest neighbor classifier, which possesses an impact on the relationships between query embedding and support centers. By combining SKL and NNSKL in meta-training, the model achieves even better performance and surpasses state-of-the-art results on several benchmarks.
    Solving Kernel Ridge Regression with Gradient-Based Optimization Methods. (arXiv:2306.16838v1 [stat.ML])
    Kernel ridge regression, KRR, is a non-linear generalization of linear ridge regression. Here, we introduce an equivalent formulation of the objective function of KRR, opening up both for using other penalties than the ridge penalty and for studying kernel ridge regression from the perspective of gradient descent. Using a continuous-time perspective, we derive a closed-form solution, kernel gradient flow, KGF, with regularization through early stopping, which allows us to theoretically bound the differences between KGF and KRR. We generalize KRR by replacing the ridge penalty with the $\ell_1$ and $\ell_\infty$ penalties and utilize the fact that analogously to the similarities between KGF and KRR, the solutions obtained when using these penalties are very similar to those obtained from forward stagewise regression (also known as coordinate descent) and sign gradient descent in combination with early stopping. Thus the need for computationally heavy proximal gradient descent algorithms can be alleviated. We show theoretically and empirically how these penalties, and corresponding gradient-based optimization algorithms, produce signal-driven and robust regression solutions, respectively. We also investigate kernel gradient descent where the kernel is allowed to change during training, and theoretically address the effects this has on generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show that using a decreasing bandwidth, we are able to achieve both zero training error and a double descent behavior.
    MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era. (arXiv:2306.16627v1 [quant-ph])
    We introduce the first large-scale dataset, MNISQ, for both the Quantum and the Classical Machine Learning community during the Noisy Intermediate-Scale Quantum era. MNISQ consists of 4,950,000 data points organized in 9 subdatasets. Building our dataset from the quantum encoding of classical information (e.g., MNIST dataset), we deliver a dataset in a dual form: in quantum form, as circuits, and in classical form, as quantum circuit descriptions (quantum programming language, QASM). In fact, also the Machine Learning research related to quantum computers undertakes a dual challenge: enhancing machine learning exploiting the power of quantum computers, while also leveraging state-of-the-art classical machine learning methodologies to help the advancement of quantum computing. Therefore, we perform circuit classification on our dataset, tackling the task with both quantum and classical models. In the quantum endeavor, we test our circuit dataset with Quantum Kernel methods, and we show excellent results up to $97\%$ accuracy. In the classical world, the underlying quantum mechanical structures within the quantum circuit data are not trivial. Nevertheless, we test our dataset on three classical models: Structured State Space sequence model (S4), Transformer and LSTM. In particular, the S4 model applied on the tokenized QASM sequences reaches an impressive $77\%$ accuracy. These findings illustrate that quantum circuit-related datasets are likely to be quantum advantageous, but also that state-of-the-art machine learning methodologies can competently classify and recognize quantum circuits. We finally entrust the quantum and classical machine learning community the fundamental challenge to build more quantum-classical datasets like ours and to build future benchmarks from our experiments. The dataset is accessible on GitHub and its circuits are easily run in qulacs or qiskit.
    Assessing the Performance of 1D-Convolution Neural Networks to Predict Concentration of Mixture Components from Raman Spectra. (arXiv:2306.16621v1 [eess.SP])
    An emerging application of Raman spectroscopy is monitoring the state of chemical reactors during biologic drug production. Raman shift intensities scale linearly with the concentrations of chemical species and thus can be used to analytically determine real-time concentrations using non-destructive light irradiation in a label-free manner. Chemometric algorithms are used to interpret Raman spectra produced from complex mixtures of bioreactor contents as a reaction evolves. Finding the optimal algorithm for a specific bioreactor environment is challenging due to the lack of freely available Raman mixture datasets. The RaMix Python package addresses this challenge by enabling the generation of synthetic Raman mixture datasets with controllable noise levels to assess the utility of different chemometric algorithm types for real-time monitoring applications. To demonstrate the capabilities of this package and compare the performance of different chemometric algorithms, 48 datasets of simulated spectra were generated using the RaMix Python package. The four tested algorithms include partial least squares regression (PLS), a simple neural network, a simple convolutional neural network (simple CNN), and a 1D convolutional neural network with a ResNet architecture (ResNet). The performance of the PLS and simple CNN model was found to be comparable, with the PLS algorithm slightly outperforming the other models on 83\% of the data sets. The simple CNN model outperforms the other models on large, high noise datasets, demonstrating the superior capability of convolutional neural networks compared to PLS in analyzing noisy spectra. These results demonstrate the promise of CNNs to automatically extract concentration information from unprocessed, noisy spectra, allowing for better process control of industrial drug production. Code for this project is available at github.com/DexterAntonio/RaMix.
    Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs. (arXiv:2306.16761v1 [math.OC])
    Recently, Ye et al. (Mathematical Programming 2023) designed an algorithm for solving a specific class of bilevel programs with an emphasis on applications related to hyperparameter selection, utilizing the difference of convex algorithm based on the value function approach reformulation. The proposed algorithm is particularly powerful when the lower level problem is fully convex , such as a support vector machine model or a least absolute shrinkage and selection operator model. In this paper, to suit more applications related to machine learning and statistics, we substantially weaken the underlying assumption from lower level full convexity to weak convexity. Accordingly, we propose a new reformulation using Moreau envelope of the lower level problem and demonstrate that this reformulation is a difference of weakly convex program. Subsequently, we develop a sequentially convergent algorithm for solving this difference of weakly convex program. To evaluate the effectiveness of our approach, we conduct numerical experiments on the bilevel hyperparameter selection problem from elastic net, sparse group lasso, and RBF kernel support vector machine models.
    Elastically-Constrained Meta-Learner for Federated Learning. (arXiv:2306.16703v1 [cs.LG])
    Federated learning is an approach to collaboratively training machine learning models for multiple parties that prohibit data sharing. One of the challenges in federated learning is non-IID data between clients, as a single model can not fit the data distribution for all clients. Meta-learning, such as Per-FedAvg, is introduced to cope with the challenge. Meta-learning learns shared initial parameters for all clients. Each client employs gradient descent to adapt the initialization to local data distributions quickly to realize model personalization. However, due to non-convex loss function and randomness of sampling update, meta-learning approaches have unstable goals in local adaptation for the same client. This fluctuation in different adaptation directions hinders the convergence in meta-learning. To overcome this challenge, we use the historical local adapted model to restrict the direction of the inner loop and propose an elastic-constrained method. As a result, the current round inner loop keeps historical goals and adapts to better solutions. Experiments show our method boosts meta-learning convergence and improves personalization without additional calculation and communication. Our method achieved SOTA on all metrics in three public datasets.
    GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps. (arXiv:2306.16612v1 [cs.CV])
    Data augmentation is now an essential part of the image training process, as it effectively prevents overfitting and makes the model more robust against noisy datasets. Recent mixing augmentation strategies have advanced to generate the mixup mask that can enrich the saliency information, which is a supervisory signal. However, these methods incur a significant computational burden to optimize the mixup mask. From this motivation, we propose a novel saliency-aware mixup method, GuidedMixup, which aims to retain the salient regions in mixup images with low computational overhead. We develop an efficient pairing algorithm that pursues to minimize the conflict of salient regions of paired images and achieve rich saliency in mixup images. Moreover, GuidedMixup controls the mixup ratio for each pixel to better preserve the salient region by interpolating two paired images smoothly. The experiments on several datasets demonstrate that GuidedMixup provides a good trade-off between augmentation overhead and generalization performance on classification datasets. In addition, our method shows good performance in experiments with corrupted or reduced datasets.
    Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds. (arXiv:2306.16617v1 [math.OC])
    Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold's curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before.
    Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning. (arXiv:2306.14522v2 [math.OC] UPDATED)
    The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed as an universal open-source optimizer in the future.  ( 2 min )
    ViP: A Differentially Private Foundation Model for Computer Vision. (arXiv:2306.08842v2 [cs.CV] UPDATED)
    Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}.  ( 2 min )
    An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning. (arXiv:2306.15786v2 [cs.LG] UPDATED)
    The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.  ( 2 min )
    Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms. (arXiv:1911.06253v5 [stat.ML] UPDATED)
    The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.  ( 2 min )
    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient. (arXiv:2301.11913v2 [cs.DC] UPDATED)
    Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.  ( 2 min )
    Concept-Oriented Deep Learning with Large Language Models. (arXiv:2306.17089v1 [cs.LG])
    Large Language Models (LLMs) have been successfully used in many natural-language tasks and applications including text generation and AI chatbots. They also are a promising new technology for concept-oriented deep learning (CODL). However, the prerequisite is that LLMs understand concepts and ensure conceptual consistency. We discuss these in this paper, as well as major uses of LLMs for CODL including concept extraction from text, concept graph extraction from text, and concept learning. Human knowledge consists of both symbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-only LLMs, however, can represent only symbolic (conceptual) knowledge. Multimodal LLMs, on the other hand, are capable of representing the full range (conceptual and sensory) of human knowledge. We discuss conceptual understanding in visual-language LLMs, the most important multimodal LLMs, and major uses of them for CODL including concept extraction from image, concept graph extraction from image, and concept learning. While uses of LLMs for CODL are valuable standalone, they are particularly valuable as part of LLM applications such as AI chatbots.  ( 2 min )
    String Diagrams with Factorized Densities. (arXiv:2305.02506v2 [cs.PL] UPDATED)
    A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density, factorized over each sample space, with a deterministic mapping from samples to return values. This is a step towards closing the gap between recent category-theoretic descriptions of probability measures, and the operational definitions of factorized densities that are commonly employed in probabilistic programming and causal inference.  ( 2 min )
    Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization. (arXiv:2306.16077v2 [cs.LG] UPDATED)
    Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits its application in handling modern large models. To address this problem, we propose a cascaded hybrid optimization method in VFL. In this method, the downstream models (clients) are trained with ZOO to protect privacy and ensure that no internal information is shared. Meanwhile, the upstream model (server) is updated with first-order optimization (FOO) locally, which significantly improves the convergence rate, making it feasible to train the large models without compromising privacy and security. We theoretically prove that our VFL framework converges faster than the ZOO-based VFL, as the convergence of our framework is not limited by the size of the server model, making it effective for training large models with the major part on the server. Extensive experiments demonstrate that our method achieves faster convergence than the ZOO-based VFL framework, while maintaining an equivalent level of privacy protection. Moreover, we show that the convergence of our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, we demonstrate that our method makes the training of a large model feasible.  ( 3 min )
    Algorithmic Censoring in Dynamic Learning Systems. (arXiv:2305.09035v2 [cs.LG] UPDATED)
    Dynamic learning systems subject to selective labeling exhibit censoring, i.e. persistent negative predictions assigned to one or more subgroups of points. In applications like consumer finance, this results in groups of applicants that are persistently denied and thus never enter into the training data. In this work, we formalize censoring, demonstrate how it can arise, and highlight difficulties in detection. We consider safeguards against censoring - recourse and randomized-exploration - both of which ensure we collect labels for points that would otherwise go unobserved. The resulting techniques allow examples from censored groups to enter into the training data and correct the model. Our results highlight the otherwise unmeasured harms of censoring and demonstrate the effectiveness of mitigation strategies across a range of data generating processes.  ( 2 min )
    Magnitude Invariant Parametrizations Improve Hypernetwork Learning. (arXiv:2304.07645v2 [cs.LG] UPDATED)
    Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks.  ( 2 min )
    Scale-Space Hypernetworks for Efficient Biomedical Imaging. (arXiv:2304.05448v2 [cs.CV] UPDATED)
    Convolutional Neural Networks (CNNs) are the predominant model used for a variety of medical image analysis tasks. At inference time, these models are computationally intensive, especially with volumetric data. In principle, it is possible to trade accuracy for computational efficiency by manipulating the rescaling factor in the downsample and upsample layers of CNN architectures. However, properly exploring the accuracy-efficiency trade-off is prohibitively expensive with existing models. To address this, we introduce Scale-Space HyperNetworks (SSHN), a method that learns a spectrum of CNNs with varying internal rescaling factors. A single SSHN characterizes an entire Pareto accuracy-efficiency curve of models that match, and occasionally surpass, the outcomes of training many separate networks with fixed rescaling factors. We demonstrate the proposed approach in several medical image analysis applications, comparing SSHN against strategies with both fixed and dynamic rescaling factors. We find that SSHN consistently provides a better accuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNs enable the user to quickly choose a rescaling factor that appropriately balances accuracy and computational efficiency for their particular needs at inference.  ( 2 min )
    Improving Identity-Robustness for Face Models. (arXiv:2304.03838v2 [cs.CV] UPDATED)
    Despite the success of deep-learning models in many tasks, there have been concerns about such models learning shortcuts, and their lack of robustness to irrelevant confounders. When it comes to models directly trained on human faces, a sensitive confounder is that of human identities. Many face-related tasks should ideally be identity-independent, and perform uniformly across different individuals (i.e. be fair). One way to measure and enforce such robustness and performance uniformity is through enforcing it during training, assuming identity-related information is available at scale. However, due to privacy concerns and also the cost of collecting such information, this is often not the case, and most face datasets simply contain input images and their corresponding task-related labels. Thus, improving identity-related robustness without the need for such annotations is of great importance. Here, we explore using face-recognition embedding vectors, as proxies for identities, to enforce such robustness. We propose to use the structure in the face-recognition embedding space, to implicitly emphasize rare samples within each class. We do so by weighting samples according to their conditional inverse density (CID) in the proxy embedding space. Our experiments suggest that such a simple sample weighting scheme, not only improves the training robustness, it often improves the overall performance as a result of such robustness. We also show that employing such constraints during training results in models that are significantly less sensitive to different levels of bias in the dataset.  ( 3 min )
    SVDinsTN: An Integrated Method for Tensor Network Representation with Efficient Structure Search. (arXiv:2305.14912v2 [cs.LG] UPDATED)
    Tensor network (TN) representation is a powerful technique for data analysis and machine learning. It practically involves a challenging TN structure search (TN-SS) problem, which aims to search for the optimal structure to achieve a compact representation. Existing TN-SS methods mainly adopt a bi-level optimization method that leads to excessive computational costs due to repeated structure evaluations. To address this issue, we propose an efficient integrated (single-level) method named SVD-inspired TN decomposition (SVDinsTN), eliminating the need for repeated tedious structure evaluation. By inserting a diagonal factor for each edge of the fully-connected TN, we calculate TN cores and diagonal factors simultaneously, with factor sparsity revealing the most compact TN structure. Experimental results on real-world data demonstrate that SVDinsTN achieves approximately $10\sim{}10^3$ times acceleration in runtime compared to the existing TN-SS methods while maintaining a comparable level of representation ability.  ( 2 min )
    Mastering Percolation-like Games with Deep Learning. (arXiv:2305.07687v2 [cs.LG] UPDATED)
    Though robustness of networks to random attacks has been widely studied, intentional destruction by an intelligent agent is not tractable with previous methods. Here we devise a single-player game on a lattice that mimics the logic of an attacker attempting to destroy a network. The objective of the game is to disable all nodes in the fewest number of steps. We develop a reinforcement learning approach using deep Q-learning that is capable of learning to play this game successfully, and in so doing, to optimally attack a network. Because the learning algorithm is universal, we train agents on different definitions of robustness and compare the learned strategies. We find that superficially similar definitions of robustness induce different strategies in the trained agent, implying that optimally attacking or defending a network is sensitive the particular objective. Our method provides a new approach to understand network robustness, with potential applications to other discrete processes in disordered systems.  ( 2 min )
    Learning Mixtures of Gaussians with Censored Data. (arXiv:2305.04127v2 [cs.LG] UPDATED)
    We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.  ( 2 min )
    Variable Importance Matching for Causal Inference. (arXiv:2302.11715v2 [stat.ME] UPDATED)
    Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling.  ( 2 min )
    Sparsity exploitation via discovering graphical models in multi-variate time-series forecasting. (arXiv:2306.17090v1 [cs.LG])
    Graph neural networks (GNNs) have been widely applied in multi-variate time-series forecasting (MTSF) tasks because of their capability in capturing the correlations among different time-series. These graph-based learning approaches improve the forecasting performance by discovering and understanding the underlying graph structures, which represent the data correlation. When the explicit prior graph structures are not available, most existing works cannot guarantee the sparsity of the generated graphs that make the overall model computational expensive and less interpretable. In this work, we propose a decoupled training method, which includes a graph generating module and a GNNs forecasting module. First, we use Graphical Lasso (or GraphLASSO) to directly exploit the sparsity pattern from data to build graph structures in both static and time-varying cases. Second, we fit these graph structures and the input data into a Graph Convolutional Recurrent Network (GCRN) to train a forecasting model. The experimental results on three real-world datasets show that our novel approach has competitive performance against existing state-of-the-art forecasting algorithms while providing sparse, meaningful and explainable graph structures and reducing training time by approximately 40%. Our PyTorch implementation is publicly available at https://github.com/HySonLab/GraphLASSO  ( 2 min )
    Meta-Calibration Regularized Neural Networks. (arXiv:2303.15057v2 [cs.LG] UPDATED)
    Miscalibration-the mismatch between predicted probability and the true correctness likelihood-has been frequently identified in modern deep neural networks. Recent work in the field aims to address this problem by training calibrated models directly by optimizing a proxy of the calibration error alongside the conventional objective. Recently, Meta-Calibration (MC) showed the effectiveness of using meta-learning for learning better calibrated models. In this work, we extend MC with two main components: (1) gamma network (gamma-net), a meta network to learn a sample-wise gamma at a continuous space for focal loss for optimizing backbone network; (2) smooth expected calibration error (SECE), a Gaussian-kernel based unbiased and differentiable ECE which aims to smoothly optimizing gamma-net. The proposed method regularizes neural network towards better calibration meanwhile retain predictive performance. Our experiments show that (a) learning sample-wise gamma at continuous space can effectively perform calibration; (b) SECE smoothly optimise gamma-net towards better robustness to binning schemes; (c) the combination of gamma-net and SECE achieve the best calibration performance across various calibration metrics and retain very competitive predictive performance as compared to multiple recently proposed methods on three datasets.  ( 2 min )
    Deep Learning for Energy Time-Series Analysis and Forecasting. (arXiv:2306.09129v2 [cs.LG] UPDATED)
    Energy time-series analysis describes the process of analyzing past energy observations and possibly external factors so as to predict the future. Different tasks are involved in the general field of energy time-series analysis and forecasting, with electric load demand forecasting, personalized energy consumption forecasting, as well as renewable energy generation forecasting being among the most common ones. Following the exceptional performance of Deep Learning (DL) in a broad area of vision tasks, DL models have successfully been utilized in time-series forecasting tasks. This paper aims to provide insight into various DL methods geared towards improving the performance in energy time-series forecasting tasks, with special emphasis in Greek Energy Market, and equip the reader with the necessary knowledge to apply these methods in practice.  ( 2 min )
    Improving Patient Pre-screening for Clinical Trials: Assisting Physicians with Large Language Models. (arXiv:2304.07396v2 [cs.LG] UPDATED)
    Physicians considering clinical trials for their patients are met with the laborious process of checking many text based eligibility criteria. Large Language Models (LLMs) have shown to perform well for clinical information extraction and clinical reasoning, including medical tests, but not yet in real-world scenarios. This paper investigates the use of InstructGPT to assist physicians in determining eligibility for clinical trials based on a patient's summarised medical profile. Using a prompting strategy combining one-shot, selection-inference and chain-of-thought techniques, we investigate the performance of LLMs on 10 synthetically created patient profiles. Performance is evaluated at four levels: ability to identify screenable eligibility criteria from a trial given a medical profile; ability to classify for each individual criterion whether the patient qualifies; the overall classification whether a patient is eligible for a clinical trial and the percentage of criteria to be screened by physician. We evaluated against 146 clinical trials and a total of 4,135 eligibility criteria. The LLM was able to correctly identify the screenability of 72% (2,994/4,135) of the criteria. Additionally, 72% (341/471) of the screenable criteria were evaluated correctly. The resulting trial level classification as eligible or ineligible resulted in a recall of 0.5. By leveraging LLMs with a physician-in-the-loop, a recall of 1.0 and precision of 0.71 on clinical trial level can be achieved while reducing the amount of criteria to be checked by an estimated 90%. LLMs can be used to assist physicians with pre-screening of patients for clinical trials. By forcing instruction-tuned LLMs to produce chain-of-thought responses, the reasoning can be made transparent to and the decision process becomes amenable by physicians, thereby making such a system feasible for use in real-world scenarios.  ( 3 min )
    Orbit Classification of asteroids using implementation of radial Basis Function on Support Vector Machines. (arXiv:2306.17138v1 [astro-ph.EP])
    This research paper focuses on the implementation of radial Basis Function (RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroids are important astronomical objects, and their orbits play a crucial role in understanding the dynamics of the solar system. The International Astronomical Union maintains data archives that provide a playground to experiment with various machine-learning techniques. In this study, we explore the application of RBF SVM algorithm to classify asteroids. The results show that the RBF SVM algorithm provides a good efficiency and accuracy to the dataset. We also analyze the impact of various parameters on the performance of the RBF SVM algorithm and present the optimal parameter settings. Our study highlights the importance of using machine learning techniques for classifying asteroid orbits and the effectiveness of the RBF SVM algorithm in this regard.  ( 2 min )
    Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms. (arXiv:2303.00515v6 [cs.LG] UPDATED)
    Forecasting the water level of the Han River is essential to control traffic and avoid natural disasters. The stream flow of the Han River is affected by various and intricately connected factors. Thus, a simple forecasting machine frequently fails to capture its serial pattern. On the other hand, a complex predictive model loses the interpretability of the model output. This work proposes a neural network model with a novel transformer exploiting a causal relationship based on prior knowledge. The transformer consists of spatiotemporal attention weight that describes the spatial and temporal causation with multilayer networks with masking. Our model has two distinguished advantages against the existing spatiotemporal forecasting models. First, the model allows the heterogeneous predictors for each site such that a flexible regression is applicable to the causal network. Next, the model is adapted to partially identified causal structures. As a result, we have relaxed the constraints of the applicable causal network through our model. In real data analysis, we use the Han River dataset from 2016 to 2021, compare the proposed model with deep learning models, and confirm that our model provides an interpretable and consistent model with prior knowledge, such as a seasonality arising from the tidal force. Furthermore, in prediction performance, our model is better than or competitive with the state-of-the-art models.  ( 3 min )
    An image segmentation algorithm based on multi-scale feature pyramid network. (arXiv:2305.10631v4 [eess.IV] UPDATED)
    Medical image segmentation is particularly critical as a prerequisite for relevant quantitative analysis in the treatment of clinical diseases. For example, in clinical cervical cancer radiotherapy, after acquiring subabdominal MRI images, a fast and accurate image segmentation of organs and tumors in MRI images can optimize the clinical radiotherapy process, whereas traditional approaches use manual annotation by specialist doctors, which is time-consuming and laborious, therefore, automatic organ segmentation of subabdominal MRI images is a valuable research topic.  ( 2 min )
    Log-linear Guardedness and its Implications. (arXiv:2210.10012v2 [cs.LG] UPDATED)
    Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.  ( 2 min )
    Temporal Robustness against Data Poisoning. (arXiv:2302.03684v2 [cs.LG] UPDATED)
    Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.  ( 2 min )
    Dynamic-Resolution Model Learning for Object Pile Manipulation. (arXiv:2306.16700v1 [cs.RO])
    Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is what scene representation to use. Prior works typically assume representation at a fixed dimension or resolution, which may be inefficient for simple tasks and ineffective for more complicated tasks. In this work, we investigate how to learn dynamic and adaptive representations at different levels of abstraction to achieve the optimal trade-off between efficiency and effectiveness. Specifically, we construct dynamic-resolution particle representations of the environment and learn a unified dynamics model using graph neural networks (GNNs) that allows continuous selection of the abstraction level. During test time, the agent can adaptively determine the optimal resolution at each model-predictive control (MPC) step. We evaluate our method in object pile manipulation, a task we commonly encounter in cooking, agriculture, manufacturing, and pharmaceutical applications. Through comprehensive evaluations both in the simulation and the real world, we show that our method achieves significantly better performance than state-of-the-art fixed-resolution baselines at the gathering, sorting, and redistribution of granular object piles made with various instances like coffee beans, almonds, corn, etc.  ( 2 min )
    Non-Convex Optimizations for Machine Learning with Theoretical Guarantee: Robust Matrix Completion and Neural Network Learning. (arXiv:2306.16557v1 [cs.LG])
    Despite the recent development in machine learning, most learning systems are still under the concept of "black box", where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing an explainable learning system has become a new trend in machine learning. In general, many machine learning problems are formulated as minimizing (or maximizing) some loss function. Since real data are most likely generated from non-linear models, the loss function is non-convex in general. Unlike the convex optimization problem, gradient descent algorithms will be trapped in spurious local minima in solving non-convex optimization. Therefore, it is challenging to provide explainable algorithms when studying non-convex optimization problems. In this thesis, two popular non-convex problems are studied: (1) low-rank matrix completion and (2) neural network learning.  ( 2 min )
    Towards Optimal Randomized Strategies in Adversarial Example Game. (arXiv:2306.16738v1 [cs.LG])
    The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. A recent line of work shows that the use of randomization in adversarial training is the key to find optimal strategies against adversarial example attacks. However, in a fully randomized setting where both the defender and the attacker can use randomized strategies, there are no efficient algorithm for finding such an optimal strategy. To fill the gap, we propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. FRAT maintains a lightweight mixture of models for the defender, with flexibility to efficiently update mixing weights and model parameters at each iteration. Furthermore, FRAT utilizes lightweight sampling subroutines to construct a random strategy for the attacker. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker. Experimental results also demonstrate the efficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.  ( 2 min )
    Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations. (arXiv:2306.16767v1 [cs.AR])
    Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64X64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference.  ( 2 min )
    NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency. (arXiv:2306.16661v1 [cs.CV])
    We introduce NaturalInversion, a novel model inversion-based method to synthesize images that agrees well with the original data distribution without using real data. In NaturalInversion, we propose: (1) a Feature Transfer Pyramid which uses enhanced image prior of the original data by combining the multi-scale feature maps extracted from the pre-trained classifier, (2) a one-to-one approach generative model where only one batch of images are synthesized by one generator to bring the non-linearity to optimization and to ease the overall optimizing process, (3) learnable Adaptive Channel Scaling parameters which are end-to-end trained to scale the output image channel to utilize the original image prior further. With our NaturalInversion, we synthesize images from classifiers trained on CIFAR-10/100 and show that our images are more consistent with original data distribution than prior works by visualization and additional analysis. Furthermore, our synthesized images outperform prior works on various applications such as knowledge distillation and pruning, demonstrating the effectiveness of our proposed method.  ( 2 min )
    Feature Selection: A perspective on inter-attribute cooperation. (arXiv:2306.16559v1 [cs.LG])
    High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development.  ( 2 min )
    Improving Fairness in Deepfake Detection. (arXiv:2306.16635v1 [cs.CV])
    Despite the development of effective deepfake detection models in recent years, several recent studies have demonstrated that biases in the training data utilized to develop deepfake detection models can lead to unfair performance for demographic groups of different races and/or genders. Such can result in these groups being unfairly targeted or excluded from detection, allowing misclassified deepfakes to manipulate public opinion and erode trust in the model. While these studies have focused on identifying and evaluating the unfairness in deepfake detection, no methods have been developed to address the fairness issue of deepfake detection at the algorithm level. In this work, we make the first attempt to improve deepfake detection fairness by proposing novel loss functions to train fair deepfake detection models in ways that are agnostic or aware of demographic factors. Extensive experiments on four deepfake datasets and five deepfake detectors demonstrate the effectiveness and flexibility of our approach in improving the deepfake detection fairness.  ( 2 min )
    DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference. (arXiv:2306.16430v1 [cs.LG])
    Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that DNA-TEQ provides a much lower quantization bit-width compared to previous proposals, resulting in an average compression ratio of 40% over the linear INT8 baseline, with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ leads the way in performing dot-product operations in the exponential domain, which saves 66% of energy consumption on average for a set of widely used DNNs.  ( 2 min )
    Finite-Sample Symmetric Mean Estimation with Fisher Information Rate. (arXiv:2306.16573v1 [math.ST])
    The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.  ( 2 min )
    HNO: Hyena Neural Operator for solving PDEs. (arXiv:2306.16524v1 [cs.LG])
    Numerically solving partial differential equations (PDEs) typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving PDEs that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different PDE instances. To measure how effective the layers are in solving PDEs, we conduct experiments on Burger's equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning PDEs' solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator  ( 2 min )
    A Collaborative Transfer Learning Framework for Cross-domain Recommendation. (arXiv:2306.16425v1 [cs.IR])
    In the recommendation systems, there are multiple business domains to meet the diverse interests and needs of users, and the click-through rate(CTR) of each domain can be quite different, which leads to the demand for CTR prediction modeling for different business domains. The industry solution is to use domain-specific models or transfer learning techniques for each domain. The disadvantage of the former is that the data from other domains is not utilized by a single domain model, while the latter leverage all the data from different domains, but the fine-tuned model of transfer learning may trap the model in a local optimum of the source domain, making it difficult to fit the target domain. Meanwhile, significant differences in data quantity and feature schemas between different domains, known as domain shift, may lead to negative transfer in the process of transferring. To overcome these challenges, we propose the Collaborative Cross-Domain Transfer Learning Framework (CCTL). CCTL evaluates the information gain of the source domain on the target domain using a symmetric companion network and adjusts the information transfer weight of each source domain sample using the information flow network. This approach enables full utilization of other domain data while avoiding negative migration. Additionally, a representation enhancement network is used as an auxiliary task to preserve domain-specific features. Comprehensive experiments on both public and real-world industrial datasets, CCTL achieved SOTA score on offline metrics. At the same time, the CCTL algorithm has been deployed in Meituan, bringing 4.37% CTR and 5.43% GMV lift, which is significant to the business.  ( 3 min )
    Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions. (arXiv:2306.16431v1 [cs.LG])
    Model-agnostic feature attributions can provide local insights in complex ML models. If the explanation is correct, a domain expert can validate and trust the model's decision. However, if it contradicts the expert's knowledge, related work only corrects irrelevant features to improve the model. To allow for unlimited interaction, in this paper we provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. For a particular set of samples, we use the corrected feature attributions to generate extra local data, which is used to retrain the model to have the right explanation for the samples. Through simulated and real data experiments on a variety of models we show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations. Adding our interactive explanations to active learning settings increases the sample efficiency significantly and outperforms existing explanatory interactive strategies. Additionally we explore how a domain expert can provide feature attributions which are sufficiently correct to improve the model.  ( 2 min )
    CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?. (arXiv:2306.16636v1 [cs.CL])
    We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) correspond? We evaluate a variety of popular LLMs, including both commercial and open-source options, and discover that only GPT-4 achieves success (accuracy $\geq$ 60\%) across all six elementary school grades, while other models falter at different grade levels. Furthermore, we assess the robustness of several top-performing LLMs by augmenting the original problems in the CMATH dataset with distracting information. Our findings reveal that GPT-4 is able to maintains robustness, while other model fail. We anticipate that our study will expose limitations in LLMs' arithmetic and reasoning capabilities, and promote their ongoing development and advancement.  ( 2 min )
    Understanding Pathologies of Deep Heteroskedastic Regression. (arXiv:2306.16717v1 [stat.ML])
    Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.  ( 2 min )
    Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering. (arXiv:2306.16541v1 [cs.CV])
    Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.  ( 2 min )
    Allocating Divisible Resources on Arms with Unknown and Random Rewards. (arXiv:2306.16578v1 [cs.LG])
    We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.  ( 2 min )
    Realistic Synthetic Financial Transactions for Anti-Money Laundering Models. (arXiv:2306.16424v1 [cs.AI])
    With the widespread digitization of finance and the increasing popularity of cryptocurrencies, the sophistication of fraud schemes devised by cybercriminals is growing. Money laundering -- the movement of illicit funds to conceal their origins -- can cross bank and national boundaries, producing complex transaction patterns. The UN estimates 2-5\% of global GDP or \$0.8 - \$2.0 trillion dollars are laundered globally each year. Unfortunately, real data to train machine learning models to detect laundering is generally not available, and previous synthetic data generators have had significant shortcomings. A realistic, standardized, publicly-available benchmark is needed for comparing models and for the advancement of the area. To this end, this paper contributes a synthetic financial transaction dataset generator and a set of synthetically generated AML (Anti-Money Laundering) datasets. We have calibrated this agent-based generator to match real transactions as closely as possible and made the datasets public. We describe the generator in detail and demonstrate how the datasets generated can help compare different Graph Neural Networks in terms of their AML abilities. In a key way, using synthetic data in these comparisons can be even better than using real data: the ground truth labels are complete, whilst many laundering transactions in real data are never detected.  ( 2 min )
    Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity. (arXiv:2306.16593v1 [stat.ME])
    A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.  ( 2 min )
    Neural networks can detect model-free static arbitrage strategies. (arXiv:2306.16422v1 [q-fin.CP])
    In this paper we demonstrate both theoretically as well as numerically that neural networks can detect model-free static arbitrage opportunities whenever the market admits some. Due to the use of neural networks, our method can be applied to financial markets with a high number of traded securities and ensures almost immediate execution of the corresponding trading strategies. To demonstrate its tractability, effectiveness, and robustness we provide examples using real financial data. From a technical point of view, we prove that a single neural network can approximately solve a class of convex semi-infinite programs, which is the key result in order to derive our theoretical results that neural networks can detect model-free static arbitrage strategies whenever the financial market admits such opportunities.  ( 2 min )
    Momentum Benefits Non-IID Federated Learning Simply and Provably. (arXiv:2306.16504v1 [cs.LG])
    Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two fundamental algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for "client drift" in its local updates. Various methods have been proposed in literature to enhance the convergence of these two algorithms, but they either make impractical adjustments to algorithmic structure, or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is a novel result since existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In the case of partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.  ( 2 min )
    Prediction of Rapid Early Progression and Survival Risk with Pre-Radiation MRI in WHO Grade 4 Glioma Patients. (arXiv:2306.16531v1 [cs.LG])
    Recent clinical research describes a subset of glioblastoma patients that exhibit REP prior to start of radiation therapy. Current literature has thus far described this population using clinicopathologic features. To our knowledge, this study is the first to investigate the potential of conventional ra-diomics, sophisticated multi-resolution fractal texture features, and different molecular features (MGMT, IDH mutations) as a diagnostic and prognostic tool for prediction of REP from non-REP cases using computational and statistical modeling methods. Radiation-planning T1 post-contrast (T1C) MRI sequences of 70 patients are analyzed. Ensemble method with 5-fold cross validation over 1000 iterations offers AUC of 0.793 with standard deviation of 0.082 for REP and non-REP classification. In addition, copula-based modeling under dependent censoring (where a subset of the patients may not be followed up until death) identifies significant features (p-value <0.05) for survival probability and prognostic grouping of patient cases. The prediction of survival for the patients cohort produces precision of 0.881 with standard deviation of 0.056. The prognostic index (PI) calculated using the fused features suggests that 84.62% of REP cases fall under the bad prognostic group, suggesting potentiality of fused features to predict a higher percentage of REP cases. The experimental result further shows that mul-ti-resolution fractal texture features perform better than conventional radiomics features for REP and survival outcomes.  ( 3 min )
    SARC: Soft Actor Retrospective Critic. (arXiv:2306.16503v1 [cs.LG])
    The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.  ( 2 min )
    BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models. (arXiv:2306.16678v1 [cs.CV])
    With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights and the activations are in binary. However, ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods or existing binarization methods to binarize ViTs compared to CNNs on datasets with a large number of classes such as ImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such as DeiT miss out on a lot of key architectural properties that CNNs have that allow binary CNNs to have much higher representational capability than binary vanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNN architecture, we include operations from the CNN architecture into a pure ViT architecture to enrich the representational capability of a binary ViT without introducing convolutions. These include an average pooling layer instead of a token pooling layer, a block that contains multiple average pooling branches, an affine transformation right before the addition of each main residual connection, and a pyramid structure. Experimental results on the ImageNet-1k dataset show the effectiveness of these operations that allow a binary pure ViT model to be competitive with previous state-of-the-art (SOTA) binary CNN models.  ( 3 min )
    Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements. (arXiv:2306.16502v1 [stat.ML])
    For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.  ( 2 min )
    Shilling Black-box Review-based Recommender Systems through Fake Review Generation. (arXiv:2306.16526v1 [cs.IR])
    Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose the first generation-based model for shilling attacks against RBRSs. Specifically, we learn a fake review generator through reinforcement learning, which maliciously promotes items by forcing prediction shifts after adding generated reviews to the system. By introducing the auxiliary rewards to increase text fluency and diversity with the aid of pre-trained language models and aspect predictors, the generated reviews can be effective for shilling with high fidelity. Experimental results demonstrate that the proposed framework can successfully attack three different kinds of RBRSs on the Amazon corpus with three domains and Yelp corpus. Furthermore, human studies also show that the generated reviews are fluent and informative. Finally, equipped with Attack Review Generators (ARGs), RBRSs with adversarial training are much more robust to malicious reviews.  ( 2 min )
    Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition. (arXiv:2306.16428v1 [cs.LG])
    Machine learning (ML) and tensor-based methods have been of significant interest for the scientific community for the last few decades. In a previous work we presented a novel tensor-based system identification framework to ease the computational burden of tensor-only architectures while still being able to achieve exceptionally good performance. However, the derived approach only allows to process real-valued problems and is therefore not directly applicable on a wide range of signal processing and communications problems, which often deal with complex-valued systems. In this work we therefore derive two new architectures to allow the processing of complex-valued signals, and show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance, while only requiring a slight overhead in computational resources to allow for complex-valued operations.  ( 2 min )
    Learning Fair Classifiers via Min-Max F-divergence Regularization. (arXiv:2306.16552v1 [cs.LG])
    As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation make them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding \textit{low-bias} FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness.  ( 3 min )
    Towards a Better Theoretical Understanding of Independent Subnetwork Training. (arXiv:2306.16484v1 [cs.LG])
    Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.  ( 2 min )
    A Food Recommender System in Academic Environments Based on Machine Learning Models. (arXiv:2306.16528v1 [cs.IR])
    Background: People's health depends on the use of proper diet as an important factor. Today, with the increasing mechanization of people's lives, proper eating habits and behaviors are neglected. On the other hand, food recommendations in the field of health have also tried to deal with this issue. But with the introduction of the Western nutrition style and the advancement of Western chemical medicine, many issues have emerged in the field of disease treatment and nutrition. Recent advances in technology and the use of artificial intelligence methods in information systems have led to the creation of recommender systems in order to improve people's health. Methods: A hybrid recommender system including, collaborative filtering, content-based, and knowledge-based models was used. Machine learning models such as Decision Tree, k-Nearest Neighbors (kNN), AdaBoost, and Bagging were investigated in the field of food recommender systems on 2519 students in the nutrition management system of a university. Student information including profile information for basal metabolic rate, student reservation records, and selected diet type is received online. Among the 15 features collected and after consulting nutrition experts, the most effective features are selected through feature engineering. Using machine learning models based on energy indicators and food selection history by students, food from the university menu is recommended to students. Results: The AdaBoost model has the highest performance in terms of accuracy with a rate of 73.70 percent. Conclusion: Considering the importance of diet in people's health, recommender systems are effective in obtaining useful information from a huge amount of data. Keywords: Recommender system, Food behavior and habits, Machine learning, Classification  ( 3 min )
    Long-Term Hourly Scenario Generation for Correlated Wind and Solar Power combining Variational Autoencoders with Radial Basis Function Kernels. (arXiv:2306.16427v1 [cs.LG])
    Accurate generation of realistic future scenarios of renewable energy generation is crucial for long-term planning and operation of electrical systems, especially considering the increasing focus on sustainable energy and the growing penetration of renewable generation in energy matrices. These predictions enable power system operators and energy planners to effectively manage the variability and intermittency associated with renewable generation, allowing for better grid stability, improved energy management, and enhanced decision-making processes. In this paper, we propose an innovative method for generating long-term hourly scenarios for wind and solar power generation, taking into consideration the correlation between these two energy sources. To achieve this, we combine the capabilities of a Variational Autoencoder (VAE) with the additional benefits of incorporating the Radial Basis Function (RBF) kernel in our artificial neural network architecture. By incorporating them, we aim to obtain a latent space with improved regularization properties. To evaluate the effectiveness of our proposed method, we conduct experiments in a representative study scenario, utilizing real-world wind and solar power generation data from the Brazil system. We compare the scenarios generated by our model with the observed data and with other sets of scenarios produced by a conventional VAE architecture. Our experimental results demonstrate that the proposed method can generate long-term hourly scenarios for wind and solar power generation that are highly correlated, accurately capturing the temporal and spatial characteristics of these energy sources. Taking advantage of the benefits of RBF in obtaining a well-regularized latent space, our approach offers improved accuracy and robustness in generating long-term hourly scenarios for renewable energy generation.  ( 3 min )
    For Kernel Range Spaces a Constant Number of Queries Are Sufficient. (arXiv:2306.16516v1 [cs.CG])
    We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} \|R_p - R_q\|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler's notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in $\{0,1\}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.  ( 3 min )
  • Open

    Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis. (arXiv:2306.16803v1 [cs.LG])
    To make reinforcement learning more sample efficient, we need better credit assignment methods that measure an action's influence on future rewards. Building upon Hindsight Credit Assignment (HCA), we introduce Counterfactual Contribution Analysis (COCOA), a new family of model-based credit assignment algorithms. Our algorithms achieve precise credit assignment by measuring the contribution of actions upon obtaining subsequent rewards, by quantifying a counterfactual query: "Would the agent still have reached this reward if it had taken another action?". We show that measuring contributions w.r.t. rewarding states, as is done in HCA, results in spurious estimates of contributions, causing HCA to degrade towards the high-variance REINFORCE estimator in many relevant environments. Instead, we measure contributions w.r.t. rewards or learned representations of the rewarding objects, resulting in gradient estimates with lower variance. We run experiments on a suite of problems specifically designed to evaluate long-term credit assignment capabilities. By using dynamic programming, we measure ground-truth policy gradients and show that the improved performance of our new model-based credit assignment methods is due to lower bias and variance compared to HCA and common baselines. Our results demonstrate how modeling action contributions towards rewarding outcomes can be leveraged for credit assignment, opening a new path towards sample-efficient reinforcement learning.  ( 2 min )
    Understanding Pathologies of Deep Heteroskedastic Regression. (arXiv:2306.16717v1 [stat.ML])
    Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.  ( 2 min )
    Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data. (arXiv:2106.03344v2 [stat.ME] UPDATED)
    Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.  ( 2 min )
    Joint Multi-view Unsupervised Feature Selection and Graph Learning. (arXiv:2204.08247v2 [cs.CV] UPDATED)
    Despite significant progress, previous multi-view unsupervised feature selection methods mostly suffer from two limitations. First, they generally utilize either cluster structure or similarity structure to guide the feature selection, which neglect the possibility of a joint formulation with mutual benefits. Second, they often learn the similarity structure by either global structure learning or local structure learning, which lack the capability of graph learning with both global and local structural awareness. In light of this, this paper presents a joint multi-view unsupervised feature selection and graph learning (JMVFG) approach. Particularly, we formulate the multi-view feature selection with orthogonal decomposition, where each target matrix is decomposed into a view-specific basis matrix and a view-consistent cluster indicator. The cross-space locality preservation is incorporated to bridge the cluster structure learning in the projected space and the similarity learning (i.e., graph learning) in the original space. Further, a unified objective function is presented to enable the simultaneous learning of the cluster structure, the global and local similarity structures, and the multi-view consistency and inconsistency, upon which an alternating optimization algorithm is developed with theoretically proved convergence. Extensive experiments on a variety of real-world multi-view datasets demonstrate the superiority of our approach for both the multi-view feature selection and graph learning tasks. The code is available at https://github.com/huangdonghere/JMVFG.  ( 3 min )
    Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach. (arXiv:2306.16906v1 [stat.ML])
    Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy can create artifacts leading to poor imputation in the presence of multimodal or complex distributions. To tackle this problem, we introduce the $k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We compare our method with previous data imputation methods using artificial and real-world data with different data missing scenarios and various data missing rates, and show that our method can cope with complex original data structure, yields lower data imputation errors, and provides probabilistic estimates with higher likelihood than current methods. We release the code in open-source for the community: https://github.com/DeltaFloflo/knnxkde  ( 2 min )
    Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity. (arXiv:2306.16593v1 [stat.ME])
    A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.  ( 2 min )
    Allocating Divisible Resources on Arms with Unknown and Random Rewards. (arXiv:2306.16578v1 [cs.LG])
    We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.  ( 2 min )
    Learning Mixtures of Gaussians with Censored Data. (arXiv:2305.04127v2 [cs.LG] UPDATED)
    We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.  ( 2 min )
    Learning thermodynamically constrained equations of state with uncertainty. (arXiv:2306.17004v1 [physics.data-an])
    Numerical simulations of high energy-density experiments require equation of state (EOS) models that relate a material's thermodynamic state variables -- specifically pressure, volume/density, energy, and temperature. EOS models are typically constructed using a semi-empirical parametric methodology, which assumes a physics-informed functional form with many tunable parameters calibrated using experimental/simulation data. Since there are inherent uncertainties in the calibration data (parametric uncertainty) and the assumed functional EOS form (model uncertainty), it is essential to perform uncertainty quantification (UQ) to improve confidence in the EOS predictions. Model uncertainty is challenging for UQ studies since it requires exploring the space of all possible physically consistent functional forms. Thus, it is often neglected in favor of parametric uncertainty, which is easier to quantify without violating thermodynamic laws. This work presents a data-driven machine learning approach to constructing EOS models that naturally captures model uncertainty while satisfying the necessary thermodynamic consistency and stability constraints. We propose a novel framework based on physics-informed Gaussian process regression (GPR) that automatically captures total uncertainty in the EOS and can be jointly trained on both simulation and experimental data sources. A GPR model for the shock Hugoniot is derived and its uncertainties are quantified using the proposed framework. We apply the proposed model to learn the EOS for the diamond solid state of carbon, using both density functional theory data and experimental shock Hugoniot data to train the model and show that the prediction uncertainty reduces by considering the thermodynamic constraints.  ( 3 min )
    Medoid splits for efficient random forests in metric spaces. (arXiv:2306.17031v1 [stat.ME])
    This paper revisits an adaptation of the random forest algorithm for Fr\'echet regression, addressing the challenge of regression in the context of random objects in metric spaces. Recognizing the limitations of previous approaches, we introduce a new splitting rule that circumvents the computationally expensive operation of Fr\'echet means by substituting with a medoid-based approach. We validate this approach by demonstrating its asymptotic equivalence to Fr\'echet mean-based procedures and establish the consistency of the associated regression estimator. The paper provides a sound theoretical framework and a more efficient computational approach to Fr\'echet regression, broadening its application to non-standard data types and complex use cases.  ( 2 min )
    Temporal Robustness against Data Poisoning. (arXiv:2302.03684v2 [cs.LG] UPDATED)
    Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.  ( 2 min )
    Solving Kernel Ridge Regression with Gradient-Based Optimization Methods. (arXiv:2306.16838v1 [stat.ML])
    Kernel ridge regression, KRR, is a non-linear generalization of linear ridge regression. Here, we introduce an equivalent formulation of the objective function of KRR, opening up both for using other penalties than the ridge penalty and for studying kernel ridge regression from the perspective of gradient descent. Using a continuous-time perspective, we derive a closed-form solution, kernel gradient flow, KGF, with regularization through early stopping, which allows us to theoretically bound the differences between KGF and KRR. We generalize KRR by replacing the ridge penalty with the $\ell_1$ and $\ell_\infty$ penalties and utilize the fact that analogously to the similarities between KGF and KRR, the solutions obtained when using these penalties are very similar to those obtained from forward stagewise regression (also known as coordinate descent) and sign gradient descent in combination with early stopping. Thus the need for computationally heavy proximal gradient descent algorithms can be alleviated. We show theoretically and empirically how these penalties, and corresponding gradient-based optimization algorithms, produce signal-driven and robust regression solutions, respectively. We also investigate kernel gradient descent where the kernel is allowed to change during training, and theoretically address the effects this has on generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show that using a decreasing bandwidth, we are able to achieve both zero training error and a double descent behavior.  ( 3 min )
    Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap. (arXiv:2302.12909v2 [cs.LG] UPDATED)
    We show that convex-concave Lipschitz stochastic saddle point problems (also known as stochastic minimax optimization) can be solved under the constraint of $(\epsilon,\delta)$-differential privacy with \emph{strong (primal-dual) gap} rate of $\tilde O\big(\frac{1}{\sqrt{n}} + \frac{\sqrt{d}}{n\epsilon}\big)$, where $n$ is the dataset size and $d$ is the dimension of the problem. This rate is nearly optimal, based on existing lower bounds in differentially private stochastic optimization. Specifically, we prove a tight upper bound on the strong gap via novel implementation and analysis of the recursive regularization technique repurposed for saddle point problems. We show that this rate can be attained with $O\big(\min\big\{\frac{n^2\epsilon^{1.5}}{\sqrt{d}}, n^{3/2}\big\}\big)$ gradient complexity, and $\tilde{O}(n)$ gradient complexity if the loss function is smooth. As a byproduct of our method, we develop a general algorithm that, given a black-box access to a subroutine satisfying a certain $\alpha$ primal-dual accuracy guarantee with respect to the empirical objective, gives a solution to the stochastic saddle point problem with a strong gap of $\tilde{O}(\alpha+\frac{1}{\sqrt{n}})$. We show that this $\alpha$-accuracy condition is satisfied by standard algorithms for the empirical saddle point problem such as the proximal point method and the stochastic gradient descent ascent algorithm. Further, we show that even for simple problems it is possible for an algorithm to have zero weak gap and suffer from $\Omega(1)$ strong gap. We also show that there exists a fundamental tradeoff between stability and accuracy. Specifically, we show that any $\Delta$-stable algorithm has empirical gap $\Omega\big(\frac{1}{\Delta n}\big)$, and that this bound is tight. This result also holds also more specifically for empirical risk minimization problems and may be of independent interest.  ( 3 min )
    Finite-Sample Symmetric Mean Estimation with Fisher Information Rate. (arXiv:2306.16573v1 [math.ST])
    The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.  ( 2 min )
    Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms. (arXiv:1911.06253v5 [stat.ML] UPDATED)
    The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.  ( 2 min )
    Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs. (arXiv:2306.16921v1 [cs.LG])
    Experimental results have shown that curriculum learning, i.e., presenting simpler examples before more complex ones, can improve the efficiency of learning. Some recent theoretical results also showed that changing the sampling distribution can help neural networks learn parities, with formal results only for large learning rates and one-step arguments. Here we show a separation result in the number of training steps with standard (bounded) learning rates on a common sample distribution: if the data distribution is a mixture of sparse and dense inputs, there exists a regime in which a 2-layer ReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm that uses sparse examples first, can learn parities of sufficiently large degree, while any fully connected neural network of possibly larger width or depth trained by noisy-GD on the unordered samples cannot learn without additional steps. We also provide experimental results supporting the qualitative separation beyond the specific regime of the theoretical results.  ( 2 min )
    Gradient flows on graphons: existence, convergence, continuity equations. (arXiv:2111.09459v3 [math.PR] UPDATED)
    Wasserstein gradient flows on probability measures have found a host of applications in various optimization problems. They typically arise as the continuum limit of exchangeable particle systems evolving by some mean-field interaction involving a gradient-type potential. However, in many problems, such as in multi-layer neural networks, the so-called particles are edge weights on large graphs whose nodes are exchangeable. Such large graphs are known to converge to continuum limits called graphons as their size grow to infinity. We show that the Euclidean gradient flow of a suitable function of the edge-weights converges to a novel continuum limit given by a curve on the space of graphons that can be appropriately described as a gradient flow or, more technically, a curve of maximal slope. Several natural functions on graphons, such as homomorphism functions and the scalar entropy, are covered by our set-up, and the examples have been worked out in detail.  ( 2 min )
    Local Risk Bounds for Statistical Aggregation. (arXiv:2306.17151v1 [math.ST])
    In the problem of aggregation, the aim is to combine a given class of base predictors to achieve predictions nearly as accurate as the best one. In this flexible framework, no assumption is made on the structure of the class or the nature of the target. Aggregation has been studied in both sequential and statistical contexts. Despite some important differences between the two problems, the classical results in both cases feature the same global complexity measure. In this paper, we revisit and tighten classical results in the theory of aggregation in the statistical setting by replacing the global complexity with a smaller, local one. Some of our proofs build on the PAC-Bayes localization technique introduced by Catoni. Among other results, we prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator. These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu\'e and Rigollet for random design regression.  ( 2 min )
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v5 [eess.IV] UPDATED)
    This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.  ( 2 min )
    Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift. (arXiv:2306.16406v2 [stat.ME] UPDATED)
    Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.  ( 2 min )
    Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning. (arXiv:2306.17052v1 [cs.LG])
    Many applications, e.g., in shared mobility, require coordinating a large number of agents. Mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where there exist global constraints on the distribution of agents (e.g., requiring capacity constraints or minimum coverage requirements to be met). We propose Safe-$\text{M}^3$-UCRL, the first model-based algorithm that attains safe policies even in the case of unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty in the transition model within a log-barrier approach to ensure pessimistic constraints satisfaction with high probability. We showcase Safe-$\text{M}^3$-UCRL on the vehicle repositioning problem faced by many shared mobility operators and evaluate its performance through simulations built on Shenzhen taxi trajectory data. Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.  ( 2 min )
    Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements. (arXiv:2306.16502v1 [stat.ML])
    For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.  ( 2 min )
    Trajectory Poisson multi-Bernoulli mixture filter for traffic monitoring using a drone. (arXiv:2306.16890v1 [cs.CV])
    This paper proposes a multi-object tracking (MOT) algorithm for traffic monitoring using a drone equipped with optical and thermal cameras. Object detections on the images are obtained using a neural network for each type of camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each DOA detection follows a von-Mises Fisher distribution, whose mean direction is obtain by projecting a vehicle position on the ground to the camera. We then use the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is a Bayesian MOT algorithm, to optimally estimate the set of vehicle trajectories. We have also developed a parameter estimation algorithm for the measurement model. We have tested the accuracy of the resulting TPMBM filter in synthetic and experimental data sets.  ( 2 min )
    Likelihood-free neural Bayes estimators for censored inference with peaks-over-threshold models. (arXiv:2306.15642v2 [stat.ME] UPDATED)
    Inference for spatial extremal dependence models can be computationally burdensome in moderate-to-high dimensions due to their reliance on intractable and/or censored likelihoods. Exploiting recent advances in likelihood-free inference with neural Bayes estimators (that is, neural estimators that target Bayes estimators), we develop a novel approach to construct highly efficient estimators for censored peaks-over-threshold models by encoding censoring information in the neural network architecture. Our new method provides a paradigm shift that challenges traditional censored likelihood-based inference for spatial extremes. Our simulation studies highlight significant gains in both computational and statistical efficiency, relative to competing likelihood-based approaches, when applying our novel estimators for inference of popular extremal dependence models, such as max-stable, $r$-Pareto, and random scale mixture processes. We also illustrate that it is possible to train a single estimator for a general censoring level, obviating the need to retrain when the censoring level is changed. We illustrate the efficacy of our estimators by making fast inference on hundreds-of-thousands of high-dimensional spatial extremal dependence models to assess particulate matter 2.5 microns or less in diameter (PM2.5) concentration over the whole of Saudi Arabia.  ( 2 min )
    Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision. (arXiv:2306.16564v1 [cs.CL])
    Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.  ( 2 min )
    The Local Approach to Causal Inference under Network Interference. (arXiv:2105.03810v4 [econ.EM] UPDATED)
    We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.  ( 2 min )
    Causal inference for the expected number of recurrent events in the presence of a terminal event. (arXiv:2306.16571v1 [stat.ME])
    We study causal inference and efficient estimation for the expected number of recurrent events in the presence of a terminal event. We define our estimand as the vector comprising both the expected number of recurrent events and the failure survival function evaluated along a sequence of landmark times. We identify the estimand in the presence of right-censoring and causal selection as an observed data functional under coarsening at random, derive the nonparametric efficiency bound, and propose a multiply-robust estimator that achieves the bound and permits nonparametric estimation of nuisance parameters. Throughout, no absolute continuity assumption is made on the underlying probability distributions of failure, censoring, or the observed data. Additionally, we derive the class of influence functions when the coarsening distribution is known and review how published estimators may belong to the class. Along the way, we highlight some interesting inconsistencies in the causal lifetime analysis literature.  ( 2 min )
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v7 [cs.LG] UPDATED)
    Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.  ( 3 min )
    UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation. (arXiv:2306.16549v1 [stat.ME])
    Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.  ( 2 min )
    Neural networks can detect model-free static arbitrage strategies. (arXiv:2306.16422v1 [q-fin.CP])
    In this paper we demonstrate both theoretically as well as numerically that neural networks can detect model-free static arbitrage opportunities whenever the market admits some. Due to the use of neural networks, our method can be applied to financial markets with a high number of traded securities and ensures almost immediate execution of the corresponding trading strategies. To demonstrate its tractability, effectiveness, and robustness we provide examples using real financial data. From a technical point of view, we prove that a single neural network can approximately solve a class of convex semi-infinite programs, which is the key result in order to derive our theoretical results that neural networks can detect model-free static arbitrage strategies whenever the financial market admits such opportunities.  ( 2 min )

  • Open

    I love this art style!
    ​ https://preview.redd.it/nkqhl3ynn89b1.png?width=768&format=png&auto=webp&s=9030640323813257a97f256fd8a67f1a492d7e9a Steps to replicate: Typed the following on Nack Ai: /imagine ((A full body portrait of a handsome 18 year old boy with an afro, in trendy black hoody)) ((Disney style animation, handsome face, rounded eyes, digital painting, fine exquisite details, fan art)) ((Background is a beautiful well lit cyberpunk alley)) ((graffiti on walls)) maximalist, cartoon-style.🌃🌃 (cgi-style)(ray-tracing lighting)((splashed multicolored paint))((multicolor paint dripping)), gamer headphones 🎧 Fantastical special effects in the background. ((Multicolor paint streaks)) rainbow prism bubbles floating in the air. | negate: | width: 768 | height: 1024 | images: 4 | prompt_weight: 7.0 | steps: 60 | model: Dreamshaper v6 | prompt_enhancement: Yes submitted by /u/rich_awo [link] [comments]  ( 8 min )
    Is there a AI music generator that can produce music similar to a particular song?
    For example, if I want to create a song similar to Fireflies by Owl City to use with a YouTube video. Not a remix but a different song inspired by another. submitted by /u/createdtexan [link] [comments]  ( 8 min )
    What is the most modern technique for generating a video from an image+driver video?
    NVIDEA's Maxine and Live Portrait microservices look very promising but they are still in early access. But if you dig into Live Portrait it seems to be built off of a three year old github repo. Example from the repo: ​ https://i.redd.it/emp3xe70s79b1.gif Another big one I have found is the first-order-model repo which is a little older and seems to do about the same kind of thing: ​ https://i.redd.it/cb8z7xf1s79b1.gif I would think there would be a lot of advancements in the last three years. Are these the best libraries to use if you want something off the shelf? Is there a name for this particular type of technology or class of algorithms? submitted by /u/blender_noob_5000 [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Microsoft has launched AI-powered shopping tools in Bing search and Edge, including AI-generated buying guides which automatically aggregate product specifications and purchase locations for user queries​, and AI-generated review summaries that provide concise overviews of online product reviews [Details]. Salesforce AI Research released XGen-7B, a new open-source 7B LLM trained on 8K input sequence length for 1.5T tokens [Details| Huggingface| GitHub]. Researchers present DreamDiffusion, a novel method for generating high-quality images directly from brain EEG signals without the need to translate thoughts into text [Paper]. Google announced the first Machine Unlearning Challenge hoste…  ( 10 min )
    Is there an AI tool for creating/combining a note?
    So I just got a task from my boss to create a note containing specific details/checkmarks. Some context: I work in sales (telecommunications) and we use a note where we can write down specific information, for example how many devices does a costumer use, how many people are using the wi-fi connection etc. Now he asked me if I could take the current note we use (it’s a double-sided DIN A4 paper) and add some new information/checkmarks on it, so that the employees ask all the relevant questions. Now my question is, is there an AI tool which is able to create/combine a PDF based on the content I’ll provide to it (in form of a pdf, image or text)? submitted by /u/FlowerpotxD [link] [comments]  ( 8 min )
    I'm looking for a comic which shows a female warrior illegally infiltrating a AI facility where a lot of humans are hooked into machines in the style of matrix numb with pleasure and she chooses to do the same
    I am trying to find it but I can't. Does anyone have it? It had multiple panels of the story just like in a comic book submitted by /u/br_shadow [link] [comments]  ( 8 min )
    Talk to me! (Conversational AI in adult VR game)
    submitted by /u/VR_hot [link] [comments]  ( 8 min )
    AI Psychedelic Jewelry [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Is this feasible?
    I’ve been thinking a lot lately about social media (Meta) and other large tech companies (Google) that profit off of our screen time with shown ads, and collecting data to sell. I know this is not a new topic, but I had an idea as to go on offense, instead of just defense with engines like Duck duck go.. Is it possible with AI to build an app that automatically searches, clicks, interacts with content on a social media site, or performs searches and I interactions on a search engine? Ideally a person could set the perimeters eg. kittens, how to xyz, rainbows, muscle cars, etc. this could run in the background while we are sleeping and the device is charging. After a time, the algorithm would produce ads catered to these searches as the profile these tech companies build on us start to morph into whatever we pick. As they do morph, the value proposition they have to sell our data to advertisers lessons as the integrity of the data falls apart. These companies as many know apply dark tactics and other psychological tactics to keep us engaged and disconnected from the real world. Thoughts on if this sort of program could be possible? submitted by /u/Pristine-Balance1827 [link] [comments]  ( 9 min )
    Snoopdog announces the cage match between Elon and Mark | Midjourney + Charactr API
    submitted by /u/3nd4u [link] [comments]  ( 8 min )
    A model which allows itself to learn?
    I downloaded my DNA from a company where I made an ancestry test, was wondering if GPT-4 would be able to find health issues or anything useful from it, obviously it is too long as an input and got an error. Now I have this question, let's say that GPT-4 did not have any idea how to do it, what model would be able to get something as a request then understand that it can not do that, then trigger something in it which starts gathering and making new neurons when necessary and new connections with the appropriate neurons so that it eventually can make sense of what I ask. submitted by /u/NotLegal69 [link] [comments]  ( 8 min )
    AI system devises first optimizations to sorting code in over a decade
    submitted by /u/eberkut [link] [comments]  ( 8 min )
    How close are we to developing a 'conscience' for AI?
    Are there any companies/startups/researchers working on this problem? Any good solutions proposed? It seems like somebody must be close to it by now. submitted by /u/medphysics [link] [comments]  ( 8 min )
    Meta explains the AI behind its social media algorithms
    submitted by /u/dahmedahe [link] [comments]  ( 8 min )
    Captivating Symbiosis: Exploring the Enigmatic Dance of Human Creativity and Artificial Intelligence
    I hope you're all having a fantastic day filled with mind-boggling insights! Today, I want to delve into the mesmerizing realm where human creativity intertwines with the vast potential of artificial intelligence. While the concept of AI has been at the forefront of our discussions, it's time to shift our focus and explore the profound symbiosis between human artists and their AI counterparts. As we witness the remarkable advancements in AI-generated art, it's easy to get lost in the magic that unfolds on our screens. With every brushstroke, every note played, and every word written, AI algorithms have been infiltrating the creative sphere, making their mark in ways we couldn't have imagined before. One could argue that AI art is a mere mimicry of human expression, lacking the profound e…  ( 9 min )
    i wonder if language models like phi-1 with quality training data could be used to train better language models by using it's emergent properties alone
    https://the-decoder.com/microsofts-tiny-phi-1-language-model-shows-the-importance-of-data-quality-in-ai-training/ ​ submitted by /u/loopy_fun [link] [comments]  ( 8 min )
    Pick Any 2 TV Shows To Crossover
    Since AI is getting quite popular, I can see it being useful for people to generate content which they aren’t getting. For instance.. I was thinking about a Grey’s Anatomy crossover with Desperate Housewives. Since that’s never likely going to happen, perhaps AI can generate something! Which shows or even characters would you like to see crossover? I’d also like to throw out the Z Nation vs Walking Dead teams doing a brawl between the two in the style of ‘Deadliest Warrior’ submitted by /u/parryfinkle [link] [comments]  ( 8 min )
    Where would I find someone who I could pay to make a fake audio for me?
    I’m trying to make a realistic sounding audio using eleven labs, but the voice sounds a little fake and I want there to be background noises over the audio like wind and clothing rustling, how can I get this done? Thanks! submitted by /u/drydrip126 [link] [comments]  ( 8 min )
  • Open

    DJL with RL?
    Has anyone used the Deep Java Library with RL? I found an example (TicTacToe), but it doesn't quite work as the example is from an old (0.60) version and the libraries are much newer. I'm playing with the library (I really prefer statically typed languages), but having trouble finding docs related to RL. If anyone has experience or can point me at some resources that discuss using DJL with RL, that'd be great. submitted by /u/scprotz [link] [comments]  ( 8 min )
    Is there any experience replay sampling method that takes into consideration frequency?
    Hello, I am researching the compression of convolutional neural networks using reinforcement learning. I use images of the dataset to compress each of the layers in a model. One of the issues I am facing is that it is getting stuck in local minima due to randomness and because the datasets have 80k images. I select a random batch of images for every test as using all 80k images consumes too much time (I fear this also might be the cause of my issue). I have tried some agents like Dueling DQN, DDPG and PPO. For all of them, I see that it gets stuck despite having explored better actions. I implemented a prioritized experience replay (PER) so that samples that have higher temporal difference errors have a higher probability of being selected. Nonetheless, I see that the Dueling DQN never chose the best sequence of actions despite it being found during exploration. Only 3 actions are taken so the episode has a short length. I was thinking about using Upper Confidence Bound every few epochs to select the best actions that might not be selected due to having low td error. Any thoughts on this? Also, if anyone would be kind enough to point out if there is any experience replay that takes into consideration the number of times each sample was selected, I would greatly appreciate it. Thanks in advance. ​ ​ submitted by /u/ElvishChampion [link] [comments]  ( 9 min )
    DDPG fails to learn a seemingly simple 2D room navigation problem
    Hello everyone, I am currently working on training a DDPG (Deep Deterministic Policy Gradient) agent for a navigation problem in a 2D room. The room is a square measuring 30m x 30m, with a star representing the central location. The goal is to control two players within the room using DDPG. Each player can move a fixed distance of 0.5m in any direction, and the DDPG actor network outputs two angles (ranging from 0 to 2*pi) for each player, representing the direction of their movements. The new coordinates (x_i, y_i) of each player are updated based on these angles (with clamping to keep the players in the room). I have designed a simple reward function for each step, where the reward is calculated as follows: reward = - (r_1 ** 2 + r_2 ** 2) / normalization_factor. Here, r_1 and r_2 r…  ( 9 min )
    Future Rewards in n-step semi-gradient TD for estimating a value function
    I am trying to understand the following algorithm from Sutton and Barton. https://preview.redd.it/cdchk04e269b1.png?width=1240&format=png&auto=webp&s=e6cd2ffa91121dbc0c1396c12c0e19eddb6b4597 I am confused about this line https://preview.redd.it/62b7zlnq269b1.png?width=404&format=png&auto=webp&s=b4acb2bb61069720aedbd3e45236278351efdea4 which unrolls to https://preview.redd.it/ct1uxe5t269b1.png?width=1178&format=png&auto=webp&s=4c33c008c40326ecf7ac99f76109605979765a32 ​ My question is, at timestep t, we don't have access to any of the future rewards. How is the summation from (t, T) computed in this case, since we don't have access to future rewards? submitted by /u/ElendirThreadripper [link] [comments]  ( 8 min )
    Libraries for Offline RL and Off-Policy Evaluation?
    Hey, so I'm pretty new to the reinforcement learning world, but not to machine/deep learning. I've done some simple one-file implementations on relatively simple RL tasks, but I'm looking to do some offline learning where I can't simulate my environment and so I also want to do off-policy evaluation. The main one-stop shop that I've seen so far is RLlib with all its offline learning algorithms and off-policy evaluation (OPE) techniques implemented and ready to go, but I've heard negative reviews about the performance overhead (I don't mind the learning curve and I have access to a large cluster). I've also seen d3rlpy and tianshou, but they don't really have OPE implementations - so I would be open to using them if anyone has any recs where to find things like OPE. Anyone have any suggestions where to go? This is out of my comfort zone with deep learning, so I figure it would be better to reach out here to everyone with so much more experience. submitted by /u/EchoMyGecko [link] [comments]  ( 8 min )
    Diambra.ai: The brand-new platform for training and competing RL algorithms on retro-fighting games
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    RL algorithms that establish causation through experiment?
    Are there any algorithms in RL which proceed in a way to establish causation through interventions in the environment? The interventions would proceed by carrying out experiments in which confounding variables are included and then excluded. This process of trying combinations of variables would continue until the entire collection of experiments allow for the isolation of causes. By interventions, I am roughly referring to their use in chapter §6.3 of this book https://library.oapen.org/handle/20.500.12657/26040 If this has not been formalized within RL, why hasn't it been tried? Is there some fundamental aspect of RL which is violated by doing this kind of learning? submitted by /u/moschles [link] [comments]  ( 8 min )
    Methods to keep agents inside grid world.
    Hi experts, I'd like to ask for your experience to help, with keeping the agents inside a grid world. How do you normally construct your environment so this issue doesn't happen. The structure of my environment is like: class env(): `self.x = int(); self.y = int()` `grid_world = (x,y)` `tmp = [x,y] # agent's location will be copied from "step" function before agent takes a move` `def move(action):` `# takes a 8 directional move based on an action` `def step(action):` `tmp =agent's location.copy()` `move(action)` `if agent_location[0] = self.x or agent_location[1] >= self.y:` `do the same as before. put back the tmp to the observation space and take a random move.` Now my problem is that whatever I do, the agent will leave the grid. The problem is not about rewarding it negative so it knows not to leave the grid. But in general how do you keep your agent inside the environment. I would appreciate your reply so much in advance. submitted by /u/vincent0110 [link] [comments]  ( 8 min )
    RL for finance
    I want to create a lightweight implementation utilizing Reinforcement learning (RL) for the FINQA dataset which is essentially question answering over financial documents but the computation resources required is - NVIDIA TITAN 24GB GDDR6: seems pretty high for a personal project. Are there any other papers, ways to reduce the resources such that it gets compactable with GTX 1660ti. Any leads/project ideas in the finance sector with RL would be of great help as well. Thanks submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    List secret ML repositories to contribute?
    I believe that open-source contributions will significantly impact candidate selection processes. Although I'm relatively new to contributing to ML models and data science, I'm eager to get involved. It would greatly benefit me if individuals could share their bookmarked projects that I could contribute to. Thank you!! submitted by /u/trafalgar28 [link] [comments]  ( 8 min )
    Update on last post; snake CNN DQN game still not learning
    This is now my updated code:my GitHub gist as per suggestions, I've implemented a replay buffer as well as copying my target network to stabilize learning however, the rewards seems to spike up and down and do not get better overall. I've also altered the CNN model so that it's more robust. After training it for about 10 hours the rewards are not improving. I was wondering if anyone would be willing and able to give me some pointers where I could go with this? prior post:prior post ​ I'm about to throw the towel and start over and abandon using CNN and just do the head relative snake game like most people submitted by /u/bleak-terminal [link] [comments]  ( 8 min )
    Jamie Simon, UC Berkeley: On theoretical principles for how neural networks learn and generalize
    submitted by /u/thejashGI [link] [comments]  ( 8 min )
  • Open

    Cayley graphs in Mathematica
    The previous post mentioned the group S4, the group of all permutations of a set with four elements. This post will show a way to visualize this group. The Mathematica command CayleyGraph[ SymmetricGroup[4], VertexLabels -> Placed["Name", Center], VertexSize -> 0.4] generates the graph below. This is an interesting image, but what does it mean? The […] Cayley graphs in Mathematica first appeared on John D. Cook.  ( 5 min )
    Permutations and centralizers in Mathematica
    I had a fleeting thought about how little I use group theory when I realize I used it just this week. A couple days ago I needed to know which permutations of 4 elements commute with reversal. If r takes a sequence and reverses it, I need to find all permutations p such that p ∘ […] Permutations and centralizers in Mathematica first appeared on John D. Cook.  ( 5 min )
    Floors and roots
    I recently stumbled across the identity while reading [1]. Here ⌊x⌋ is the floor of x, the largest integer not larger than x. My first thought was that this was hard to believe. Although the floor function is very simple, its interactions with other functions tends to be complicated. I was surprised that such a […] Floors and roots first appeared on John D. Cook.  ( 5 min )
  • Open

    Don't use AI detectors for anything important
    I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating. Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in  ( 6 min )
    Bonus: Several ways of looking at a sandwich hole
    In an effort to make AI detectors stop classifying a paragraph from my book as AI generated, I experimented with a few different ways of getting chatGPT to reword it. Here are some of my favorites.  ( 3 min )
  • Open

    [D] Career
    Hi reddit , I'm a computer science student and I'm so confused about which branch to choose ,I'm interested in AI but this field is very wide , honestly I want to learn something that will make me money So please can you help me to choose a branch in Ai or something better and tell me briefly some jobs or ways to make money whit . submitted by /u/saf_saf_ [link] [comments]  ( 8 min )
    [R] Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture - Meta Ai Yann LeCun et al - Code + checkpoints released!
    Paper: https://arxiv.org/abs/2301.08243 Github: https://github.com/facebookresearch/ijepa Includes code + checkpoints! Abstract: This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction. https://preview.redd.it/k3e7l15b579b1.jpg?width=726&format=pjpg&auto=webp&s=2bc88fad0ac10d398d95fde2069e8d2c403a2ebb https://preview.redd.it/lx3hk55b579b1.jpg?width=668&format=pjpg&auto=webp&s=1650e3e8ee17af97e3614704cd422a018bc18ff8 https://preview.redd.it/nhx0725b579b1.jpg?width=1249&format=pjpg&auto=webp&s=cfda0bf3d4ea07a2e43ce989337fe00e021b013e https://preview.redd.it/p285y55b579b1.jpg?width=522&format=pjpg&auto=webp&s=b975c23a60e8e27d7f218bc6ba6b25f8f711d3b1 https://preview.redd.it/ipa7v25b579b1.jpg?width=1483&format=pjpg&auto=webp&s=bff6c87ec22d88e5a00aebea8e160765639140fd ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Hybrid TSP Solver
    Hello everyone, this is my inaugural post in this community, and I'm excited to share with you my master's degree project. It's an innovative TSP (Traveling Salesman Problem) solver that blends the 1Tree branch-and-bound algorithm by Held and Karp with a Graph Convolutional Network. I've made the entire codebase available on this Github repository, along with detailed information and the good results obtained. I'm posting this to receive feedback on my work, including your thoughts on potential areas for improvement and whether you believe some of these concepts could also be applied to Concorde. submitted by /u/Lorenzos98 [link] [comments]  ( 8 min )
    [N] Training LLMs with AMD MI250 GPUs and MosaicML
    https://www.mosaicml.com/blog/amd-mi250 AMD datacenter GPUs are finally viable for ML training! Hope this will increase supply and reduce GPU prices and training costs in the long term. submitted by /u/ml_hardware [link] [comments]  ( 8 min )
    [D] How are folks daisy chaining ML models today?
    It seems to be quite common where you have different models trained on different tasks and the output of one model is the input in the next. does any tooling exist to support this? Is it a frustrating enough problem where folks would be willing to outsource and then just get an inference endpoint? Totally just fielding ideas here for climb.dev submitted by /u/vanlifecoder [link] [comments]  ( 8 min )
    [D] How do you keep up with ML news on Twitter? Suggestions for people to follow?
    A few of my friends always talk about how they find so many papers and state of the art research just from Twitter, but they work in CV whereas I work more with stats / privacy / other ML approaches (eg not much DL). I've followed people who have published papers I have liked, but many times they either are not on twitter or are highly inactive. Occasionally they'll announce a paper, but typically our interests only partially overlap so the papers I do get are only tangential to my research. Likewise, all the big / "creator" ML people mainly post about LLMs / diffusion models which also aren't relevant to my research. Is there a better way to narrow down twitter to better keep up with the state of the art in your subfield than just following people who publish interesting papers as you find them? submitted by /u/Amun-Aion [link] [comments]  ( 8 min )
    [R] Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
    submitted by /u/gabloa [link] [comments]  ( 8 min )
    [P] Need project idea on many-particle system
    Basically the title. I'm getting started with OpenGL (not sure if its needed, since there's Freud library for generating particles) but i would like to work on particle systems and any niche machine learning use case is welcome. I have lot of time to complete, so i m open to research as well. Any other application in shaders, ray tracing, screen space or anything with pixels are welcome too. submitted by /u/Remote_Ad5597 [link] [comments]  ( 8 min )
    [D] OptiTalk
    Yo guys what do you think of this website? It's similar to character,ai. It looks much cleaner though in my opinion. https://optitalk.net/ submitted by /u/CommissarCheeseball [link] [comments]  ( 8 min )
    [P]: Generate tweets about a current topic, based on a usernames style
    I built this demo with Haystack Agents, using a tool for WebSearch to account for topics the LLM may not have knowledge on. Think the latest news etc. And a TwitterRetriever tool which is a custom Haystack node that retrieves the latest (15 in this case) tweets for the given username. ​ Result: you can ask something like: What would the twitter user dog_feelings say about the titan submarine? The demo is available on HuggingFace and the code is also hosted on GitHub: https://github.com/TuanaCelik/what-would-mother-say ​ https://preview.redd.it/k0pexdywj69b1.png?width=1324&format=png&auto=webp&s=bfa1740e5172fa900af684b5ac905f798b77f83a ​ submitted by /u/tuanacelik [link] [comments]  ( 8 min )
    Need a roadmap to create ai in teaching industry [D]
    Im thinking about creating ai in teaching. So ai marks student's work. Student will be writing on ipad or any other tablet with touch pen and ai marks it. Provide tailored questions. For example if student is wrong with certain question than ai generates similar questions to the student. Also if student is weak in math than provide easy questions, if student is strong than ai provide harder questions. Also ai explain concepts specifically where student made a mistake. Also ai provide teaching verbal explanation to students. Lastly based on student's performance, ai create report as well. Can someone give me a roadmap to creat such ai with above functions? submitted by /u/andyoh115 [link] [comments]  ( 8 min )
    [P] Secret Santa: Serve SantaCoder for code completion without exposing code IP using BlindBox
    Hi there! We have recently released an example where we use BlindBox, our open-source confidential deployment solution, to serve SantaCoder for confidential code completion. Would be happy to have your feedback! Gradio demo: https://huggingface.co/spaces/mithril-security/Santacoder-demo Blog post: https://blog.mithrilsecurity.io/ai-assisted-code-generation-with-privacy-guarantees-securely-deploy-santacoder-with-blindbox/ Docs: https://blindbox.mithrilsecurity.io/en/integration-task/docs/how-to-guides/santacoder/?ref=blog.mithrilsecurity.io GitHub: https://github.com/mithril-security/blindbox/ submitted by /u/Separate-Still3770 [link] [comments]  ( 8 min )
    [R] Image dataset for detecting defects in the manufacturing of screws
    Can anyone suggest where to find/buy a dataset of that kind? I'd like to run some experiments on a production line without starting from a generic dataset. Thank you! submitted by /u/jiiins [link] [comments]  ( 8 min )
    [D] Switch Net and Stochastic Gradient Descent
    [ Removed by Reddit on account of violating the content policy. ] submitted by /u/AI462QQQ [link] [comments]  ( 8 min )
    [D] Books for ML - different levels, any suggestions?
    Hello :) ​ TLDR, please suggest some interesting books for ML that cover introductory, intermediate, and advanced topics. ​ Our department has a surplus, so I suggested using part of this money to add new books to the department library. Here is the list of books that I gathered to add: The Elements of Statistical Learning. Second Edition. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Reinforcement learning, An Introduction. Second Edition. Richard S. Sutton and Andrew G. Barto. Deep Learning. Illustrated Edition. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Second Edition. Steven L. Brunton and J. Nathan Kutz. Mathematics for Machine Learning. Deisenroth, A. Aldo Faisal, and C…  ( 9 min )
    [Discussion] How to use LDA effectively?
    I have scrapped Glassdoor Reviews. The reviews are split into two columns: Pros : What the employees liked about the company Cons : what the employees didn't like about the company After all the necessary text pre-processing from removing stopwords, punctuation, symbols, performing stemming and lemmatization and visualizing wordclouds , I thought about performing Topic Modeling algorithms for more comprehension of my data , since wordclouds are not enough. For this , according to my small knowledge , I should be testing multiple algorithms. I chose LDA, NMF , HDP & LSA. Now to find out the best performing algorithm , I chose to calculate these metrics: Coherence Score Topic Diversity Topic Coverage Topic Stability Perplexity I started with LDA , and what I did is I varied the number of topics and based on that I wanted to see this . I was wondering If I'm doing this correctly for now because I'll be doing the same for other 3 algorithms , because one thing I know that is the number of topics is my hyperparameter and in this case I have to base on different metrics to reach the most optimal result. This is what I plotted at the end of LDA: ​ ​ https://preview.redd.it/h8i2knika49b1.png?width=1589&format=png&auto=webp&s=5b8a74fc1bf1626d7174ac902c3b7e16e420bdbd submitted by /u/Miserable_Run_6377 [link] [comments]  ( 9 min )
    [P] Reinforcement learning Question Answering
    I want to create a lightweight implementation utilizing Reinforcement learning (RL) for the FINQA dataset which is essentially question answering over financial documents but the computation resources required is - NVIDIA TITAN 24GB GDDR6: seems pretty high for a personal project. Are there any other papers, ways to reduce the resources such that it gets compactable with GTX 1660ti. Any leads/project ideas in the finance sector with RL would be of great help as well. submitted by /u/Priyanshu1_ [link] [comments]  ( 8 min )
    [R] HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
    Paper https://arxiv.org/abs/2306.15794 Blog https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna Colab https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp_I8aNC_aLhL?usp=sharing Abstract Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, t…  ( 9 min )
    [D] Anyone here actively use chatGPT, bard, copilot, etc. while coding and reviewing PRs? How do you tend to use it?
    Anyone here actively use chatGPT, bard, copilot, etc. while coding and reviewing PRs? How do you tend to use it? submitted by /u/MLtinkerer [link] [comments]  ( 8 min )
    [P] Transform Your Ideas into Reality with DemoGPT: A New Way to Prototype Using 🦜️🔗LangChain
    Hi Folks! Ever wished for a way to transform your ideas into prototypes rapidly, using simple textual prompts? Look no further! I'm thrilled to introduce DemoGPT, an innovative tool that empowers you to create demos using Large Language Models (LLMs) quickly. Here's the magic: DemoGPT leverages LangChain to generate LLM-based application code from your ideas. You can prototype and get a jump start on your app code, no matter your coding expertise level. Rapid prototyping, accessible to everyone, seamless integration with LangChain for code generation, and being open source are some of the highlights of DemoGPT. Check out this game-changer tool on GitHub and start bringing your ideas to life: https://github.com/melih-unsal/DemoGPT Can't wait to see what you create! Best, Melih submitted by /u/demogpt [link] [comments]  ( 8 min )
  • Open

    Generalizing Adversarial Robustness with Confidence-Calibrated Adversarial Training in PyTorch
    Taking adversarial training from this previous article as baseline, this article introduces a new, confidence-calibrated variant of adversarial training that addresses two significant flaws: First, trained with L∞ adversarial examples, adversarial training is not robust against L2 ones. Second, it incurs a significant increase in (clean) test error. Confidence-calibrated adversarial training addresses these problems by encouraging lower confidence on adversarial examples and subsequently rejecting them. The post Generalizing Adversarial Robustness with Confidence-Calibrated Adversarial Training in PyTorch appeared first on David Stutz.  ( 10 min )
  • Open

    Failing to train and avoid overfit on noisy training data
    I have added some Gaussian noise to CIFAR10 training and test set. I am using VGG16 and ResNet34 as the model to be used for training. Under normal training conditions, where the standard CIFAR10 is used(without any Gaussian noise), the training happens as expected. The model does not overfit when trained using the right transforms and data augmentation. I am able to achieve 95% training accuracy and 91% test accuracy. But if I add minor Gaussian noise to the entire training and test set and train the model using the same transforms, the model overfits considerably. The training accuracy reached 98% whereas the test accuracy does not go past 80%. My initial reasoning for this was that I used color jittering for the transforms during normal training(when using clean data). But since colo…  ( 9 min )
    MNIST neural network can't improve after 3000 Epochs
    Hello, I am new to neural networks and am trying to make a multilayer perceptron network. I'm not sure wither the problem is with my forward pass or backward pass. I use a learning rate of 0.02, 2 hidden layers, 30 neurons each, and a batch size of 32. With this setup it achieves an accuracy of 9.8% or about guessing. I've tried lowering the LR, changing the size of the network and other things. All the code is in java. Here is the code for the backpropagation: public Deltas BackPropagation(Prediction Out){ //δ Important Greek letter double[] δ = Mult((Subtract(Out.Outputs, Out.Expected )), ActivationPrime(Add(multiply(OutputWeights, Out.HiddenLayers[Out.HiddenLayers.length - 1]), OutputBias))); double[][] OutputWeightDeltas = new double[OutputWeights.length][OutputWeights[0].length]; …  ( 9 min )
    List secret ML repositories to contribute?
    I believe that open source contributions will significantly impact candidate selection processes. Although I'm relatively new to contributing to ML models and data science, I'm eager to get involved. It would greatly benefit me if individuals could share their bookmarked projects that I could contribute to. Thank you!! submitted by /u/trafalgar28 [link] [comments]  ( 8 min )
  • Open

    When computer vision works more like a brain, it sees more like people do
    Training artificial neural networks with data from real brains can make computer vision more robust.  ( 10 min )
    Educating national security leaders on artificial intelligence
    Experts from MIT’s School of Engineering, Schwarzman College of Computing, and Sloan Executive Education educate national security leaders in AI fundamentals.  ( 9 min )
    Researchers teach an AI to write better chart captions
    A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.  ( 10 min )
  • Open

    Democratize computer vision defect detection for manufacturing quality using no-code machine learning with Amazon SageMaker Canvas
    Cost of poor quality is top of mind for manufacturers. Quality defects increase scrap and rework costs, decrease throughput, and can impact customers and company reputation. Quality inspection on the production line is crucial for maintaining quality standards. In many cases, human visual inspection is used to assess the quality and detect defects, which can […]  ( 10 min )
  • Open

    Data assets in the AI era
    Here’s a hypothesis: Smart data (enriched with metadata that makes it connectable with other data, Tinker Toy style)  might be fungible in ways that dumb data isn’t.  For exchange and reuse purposes, can data be “fungible” at all? Consider this excerpt from an explainer on the Cointelegraph.com site: “Fungible tokens or assets are divisible and… Read More »Data assets in the AI era The post Data assets in the AI era appeared first on Data Science Central.  ( 21 min )
    The music of the Riemann Hypothesis: Sound Generation in Python
    Not long ago, I published an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is a… Read More »The music of the Riemann Hypothesis: Sound Generation in Python The post The music of the Riemann Hypothesis: Sound Generation in Python appeared first on Data Science Central.  ( 22 min )

  • Open

    Non-sentient AI future
    Watching all these AI generated ads and whatnot, I find them incredibly disturbing. Serious uncanny valley vibes from them. I keep having this horrible thought about the future. Humans no longer exist (for whatever reason, not necessarily wiped out by AI) but our world continues much as it did before, except every "person" is an AI, through machine learning over unimaginable numbers of observations of real human behavior, they are able to perfectly replicate a functioning human being in appearance and actions. You could live in this world your whole life and never know the difference. Aliens could contact the AI and never know either. Except they are not sentient. They have no self awareness or actual thinking at all. Just perfect impressions of actual humans, doing things and reacting to things based on their learned data. It's absolutely terrifying. I'm reminded of the Westworld quote " If you can't tell the difference, does it matter?" Maybe it doesn't. submitted by /u/cheezum5000 [link] [comments]  ( 8 min )
    [Give us Feedback] Recast AI: Turn your want-to-read articles into rich audio summaries.
    Hey folks, I am one of the co-founders of Recast, an AI-powered service that helps you enjoy any text article on the web in an easier, faster way. In brief, Recast turns any text you care about into a short, news-style explainer podcast using AI. Much more than a summary, our virtual hosts actually have a back-and-forth conversation, which makes all the difference. You can submit your own, or discover what other users have already recast. Recasts are shorter than reading the whole article would be, plus listening is often a much more palatable way of consuming knowledge, as you can listen to a Recast on a walk, on the commute, while doing the dishes, etc. Here’s a couple of Recasts you can listen to directly: [NYTimes] In Norway, the Electric Vehicle Future Has Already Arrived [DataDrivenInvestor] Dear Sam Altman- There was never an era of making models bigger [MIT Tech Review] China isn’t waiting to set down rules on generative AI [The Egg And The Rock] In cosmology, all our errors lean the same way. The implications are... interesting We'd love to get your feedback! submitted by /u/ekurutepe [link] [comments]  ( 8 min )
    The face of someone having the perfect poo according to AI
    submitted by /u/just_quit_smoking [link] [comments]  ( 8 min )
    How long is the musiclm waitlist?
    i went on the waitlist on 14th of may and i've still been waiting, the idea is so good and i want to have fun with it someone i know went on the waitlist YESTERDAY and already has access, so i dont know whats happening any explanation? submitted by /u/Ctoousiwcasasi [link] [comments]  ( 8 min )
    If Elon Musk and Mark actually fight, this is probably how the conversation would go.
    submitted by /u/3nd4u [link] [comments]  ( 8 min )
    My boss has told me to read a 200 page annual report and identify every sentence that has a "numerical or empirical statement" in it and highlight. I have to think this would be easy to do with AI but am hitting a wall. Any tools that you would recommend I check out?
    I thinks he's looking for statements ranging from very obvious (revenue rose 10%) to less clear (we opened a new facility during the fiscal year) if that makes any difference. Thank you for your help, I'm facing like 3 days of reading right now lol. submitted by /u/Dr_Work_Cr_Soul [link] [comments]  ( 8 min )
    Advice on AI Needed
    Hi everyone! Although I am not a programmer of anysorts, I've been wanting to start a venture that relies on the use of A.I. I need to make a bot that can read pictures and use its database to categorize/ diagnose what it is seeing. I've made some contacts and have vague backing for the data so far, but right now having the capability to have the bot is my upmost priority. I've never really owned or created any machine learning and I don't know where to start and if I even have the capabilities. submitted by /u/XxxMeowMeowPurrxxX [link] [comments]  ( 8 min )
    AI Buildings Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    AI to review, critique and advise for design files?
    Hi there, im wondering if you people know if theres an Ai that i can give a graphic with some context (e.g. "this will be used as a printed business card in the size x * y") and it can give me hints and pointers of what it believes could be done better? Thanks in advance submitted by /u/meerjungfrauman [link] [comments]  ( 8 min )
    Have you tried any AI chat language learning tools? What did you think?
    I just tried a language learning tool called "gopenpal.ai" where you can chat with an AI in your target language. It has built in translations and you can click on words to see their definitions. It also corrects your writing. I liked that before you enter the chat you can choose the level of difficulty you want the conversation to be in (from A1 to C2). I thought it was pretty good but could do with some more features like links to online dictionaries for each word you click on (like you get on LingQ). Also, as beginner Italian learner, I don't know how correct the AI's messages are and the corrections it offers. Anyone here tried similar sites? What did you think? submitted by /u/krampster2 [link] [comments]  ( 8 min )
    Jason Bateman in American Psycho
    Used Roop And Elevenlabs and edited in Premiere Pro submitted by /u/AthleteEducational63 [link] [comments]  ( 8 min )
    Is there an AI where I can give it lyrics and tell it what style song I want and have it produce a music file?
    I'm working on a project for school which I'm creating a fake band webpage for and I really wanted to give it that extra oomph by adding some AI generated music. It needs to be metal music sung by female vocalists. submitted by /u/VirtualAdeptGirl [link] [comments]  ( 8 min )
    Sexually explicit open sourced language models? Do they exist? (asking for a friend of friend)
    Hey yall. Wondering if anyone knows of any models that exist that are at par with gpt3.5 (or close to) Purely for research purposes. submitted by /u/chriscarmy [link] [comments]  ( 8 min )
    Video AI Learning path ?
    Hi, I'm a complete newbie to this. Can someone please share the learning paths / documents / tutorials etc to train something related to videos? Ex. feeding good looking sky images > identifying the sky of a given video > replace with a good sky or similar tasks. submitted by /u/sdw23 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/28/2023
    AlphaGo from Google’s DeepMind AI lab made history by defeating a champion player of the board game Go. Now Demis Hassabis, DeepMind’s cofounder and CEO, says his engineers are using techniques from AlphaGo to make an AI system dubbed Gemini that will be more capable than that behind OpenAI’s ChatGPT.[1] As anyone who’s seen depictions of AI in movies like 2001: A Space Odyssey and Alien will know, you simply don’t put your life control system in the hands of a sentient computer. Now, though, NASA is seemingly going against everything Hollywood has taught us about AI space assistants by developing a system that will allow astronauts to use a natural-language ChatGPT-like interface in space.[2] A team of researchers, including professors from the University of Montana and UM Western, have found that OpenAI’s GPT-4 scored in the top 1% on the Torrance Tests of Creative Thinking (TTCT), matching or outperforming humans in the creative abilities of fluency, flexibility, and originality.[3] Shares of U.S. chipmakers fell on Wednesday following reports that the Biden administration was planning new curbs on the export of computing chips for artificial intelligence to China as early as July.[4] Sources: [1] https://www.wired.com/story/google-deepmind-demis-hassabis-chatgpt/ [2] https://interestingengineering.com/innovation/nasa-chatgpt-ai-lunar-gateway [3] https://nbcmontana.com/news/local/um-um-western-researchers-find-openais-gpt-4-outperforms-humans-in-creativity-tests [4] https://www.reuters.com/markets/europe/chip-stocks-drop-us-mulls-fresh-curbs-ai-access-china-2023-06-28/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    a Voice changer worth investing in?
    I am working on a cartoon show for my brand's youtube channel, and now that I have designed all the characters and have written a few episodes (about 6), I am having a tough time with choosing the voices of each character. I have been going over multiple voice apps like voice and voicemods...etc. To be honest, I am not very impressed. People have been hyping them up, but they sound very fake and broken and they skip words and just not the best quality. But they all suck on the freemium version. I am going to have to purchase the premium version and I am okay with that. I was just wondering if anyone has purchased any of the voice generation apps and if they can recommend which is the top one that's making the most strides forward. submitted by /u/itsokmydadisrich [link] [comments]  ( 8 min )
    Has anyone heard any updates on UBI reasearch
    So a long while back I heard of OpenAI researching UBI. But I haven't heard ANYTHING about what they are finding, if they just asked GPT3 what we ask and that's the "research" or anything else. And I'm not hearing much anything on any other research on UBI. ​ Has anyone heard any updates on this with OpenAI? What about other places? submitted by /u/crua9 [link] [comments]  ( 8 min )
  • Open

    "Monte-Carlo Planning in Large POMDPs", Silver & Veness 2010
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Can I make a customized robot in OpenAI gym?
    I am running a code project based on OpenAI gym. However, the project initially uses ant robots, which make it less convinced for later research. I want to replace ant robots with some more realistic models, for example, a turtlebot or clearpath robot. Does anyone know if it is possible? If yes, where should I start? Should I build a model from scratch, or there exists ones I could use directly? Any suggestion is appreciated! submitted by /u/Iton608 [link] [comments]  ( 8 min )
    How much do people care about Gym/gymnasium environment compatibility?
    I've written my own multiagent grid world environment in C with a nice real-time visualiser (with openGL) and am thinking of publishing it as a library. The step function call works basically exactly the same as in Gym. But that's basically where the similarities end. Due to the way I implemented it will probably be a pain to get it fully compatible with Gym. Do people really care that much about Gym compatibility? submitted by /u/askii717 [link] [comments]  ( 8 min )
    Invariance of episode length - rewards
    First, I use Q-Learning based method (DDQN). So, I have a task that can take up to M steps, and I have a proxy value function that is actually my reward (it was trained before MY value function). After every K timestamps, I get a reward according to the proxy value function on the relevant state. It sounds like a strange method, but it helps with sample efficiency, and the task is very challenging to learn from scratch. Now, the issue is - I can either pick rewards that are strictly positive, negative, or mixed. When I use mixed rewards, I observe weird behaviors such as doing the task well first and then starting to perform badly due to negative expected rewards in the future, probably (it's better to die now, because it's going to be bad next). If I use a positive reward, things seem to be more stable. However, I think the agent would tend to prolong episodes. I thought about dividing the returns by the trajectory (episode) length - is it the problematic approach? Are there any papers you are aware of which discuss this issue? Going greedy does not help me - I see that 0.99-1 discount factors work better for my use case (probably due to some bias & representation and lack of long term planning, I don't know, it's all black magic - but states can look similar to the network, rewards will still be very different towards the end of the task). TLDR: I want the agent to be invariant to episode length, how should I shape my rewards? Also, I understand this question is very env specific - but I mainly would appreciate references to papers related to this issue - I am not looking for "a solution" LOL. submitted by /u/pyepyepie [link] [comments]  ( 9 min )
    Is there an implementation of non-deep RL algorithms based on Stable Baselines3?
    Hi, I'm currently working on satisficing RL, which can be roughly described as obtaining a reward of 10 instead of maximizing the reward. For my current approach, I have built upon the Stable Baselines3 (SB3) Deep Q-learning algorithm. However, I occasionally encounter some unexpected results. To determine whether these issues are related to deep learning or our satisficing framework, I would like to test satisficing using classical Q-learning. Has anyone already utilized the SB3 framework to implement Q-learning? Alternatively, do you know of a good Q-learning implementation that is compatible with Gymnasium/Gym discrete environments ? Thank you. submitted by /u/Butanium_ [link] [comments]  ( 8 min )
    MADDPG: reward drops fast after updating ac network
    I'm using MADDPG to do computation offloading. But the reward drops fast once I start sampling from the replay buffer. ​ https://preview.redd.it/ukk82xkeav8b1.png?width=801&format=png&auto=webp&s=8ca2cc8a68dd4d9048d02ba58df30df0afbe42d8 ``` class critic(abstract_agent): ​ def __init__(self, obs_shape, act_shape): super().__init__() self.LRelu = nn.LeakyReLU(0.01) self.linear_c1 = nn.Linear(act_shape + obs_shape, 64) self.linear_c2 = nn.Linear(64, 64) self.linear_c = nn.Linear(64, 1) ​ def forward(self, obs_input, act_input): x_cat = self.LRelu(self.linear_c1(torch.cat([obs_input, act_input], dim=1))) x = self.LRelu(self.linear_c2(x_cat)) x = self.linear_c(x) ​ return x ​ class actor(abstract_agent): ​ def __init__(self, num_input, action_size): super().__init__() s…  ( 9 min )
  • Open

    [D] Toy Datasets and Models to experiment with Knowledge Distillation
    I am planning to conduct research on knowledge distillation, specifically exploring certain mathematical characteristics of the technique. Consequently, I will need to conduct numerous experiments. Considering my constraint of a 30-hour weekly limit on Kaggle, I am aiming to minimize training time. Could you kindly recommend suitable models and datasets for my research? submitted by /u/MazenAmria [link] [comments]  ( 8 min )
    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [D] Without the hype: How do current state-of-the-art LLMs benefit society?
    This is in contrast to obvious harms of current LLMs like uses for information warfare and election interference and „far-out“ benefits of potentially more powerful future models. submitted by /u/optimized-adam [link] [comments]  ( 8 min )
    [R] Training Transformers with 4-bit Integers - Haocheng Xi et al Tsinghua University - 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%!
    Paper: https://arxiv.org/abs/2306.11987 Github: https://github.com/xijiu9/Train_Transformers_with_INT4 Abstract: Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%. https://preview.redd.it/jqe4ickl809b1.jpg?width=1348&format=pjpg&auto=webp&s=64ab9fd77b8fbd929db65da5c89668dd1039d5c8 https://preview.redd.it/hvldoekl809b1.jpg?width=1641&format=pjpg&auto=webp&s=879a96da6c7b5654af2d899013515a27c3d1736c https://preview.redd.it/jvu6bekl809b1.jpg?width=1514&format=pjpg&auto=webp&s=02393a7d7fc91ce35dfea16b741d8ccb79bcf0fe ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] MultiEL: Multilingual Entity Linking model by BELA model
    Hey everyone, I was create simple python package to use BELA model for create Entity Linking in 98 languages!!! It use Bi-encoder Entity Linking Architecture (BELA) from Meta. The model is MIT license!!! GitHub: https://github.com/wannaphong/MultiEL Demo: https://github.com/wannaphong/MultiEL/blob/main/notebook/test.ipynb submitted by /u/wannaphong [link] [comments]  ( 8 min )
    [Project] Prototype - (Auto) Codebase to Video for a given perspective. Test case: Pybads and Project Management
    Video: https://youtu.be/jNwRqcrz6rg A video created by an app built on top of the BabyDragon package(still in dev) - it is still in the very early stages of development and the pipeline is currently a bit janky but I would love to hear any feedback/suggestions even if it is just nitpicking. Thank you. Source Code for test repo: https://github.com/acerbilab/pybads submitted by /u/NovelspaceOnly [link] [comments]  ( 8 min )
    [N] Beware of Unreliable Data in Model Evaluation: Case Study w/ Flan T5 LLM
    Hello Redditors! LLMs have undoubtedly established themselves as the vanguards of natural language processing, consistently pushing the limits of language comprehension and generation, thus solidifying their prominent position in the field. It's pretty common now for data scientists and ML engineers to validate the quality of their training data being fed into these LLMs, but what about their test data used to evaluate them? I spent some time playing around with the FLAN-T5 open-source LLM from Google Research and I discovered that noisy test/evaluation data can actually cause you to choose sub-optimal prompts. Using the observed accuracy, you would select Prompt A as better. Prompt B is actually the better prompt when evaluated on the clean test set. Given two prompts A and B, I found multiple cases where prompt A performed better on the observed (noisy) test data, yet worse on the high-quality test data. In reality, this means that you would choose A as the "best prompt" when prompt B is actually the better one. I also proved the accuracy difference to be significant via McNemar’s Test. I wrote up a quick article that explains my methodology and how I used data-centric AI to automatically clean the noisy test data in order to ensure optimal prompt selection. Let me know what you think! submitted by /u/cmauck10 [link] [comments]  ( 9 min )
    [D] Cluster Algorithm - Successively Approach
    A company offers 5 types of services (j = 5), where S = [1, ..., j]. To carry out the different services, the company has 5 types of equipment (i = 5), where E = [1, ..., i]. I would like to use the table presented below to perform a cluster analysis. However, I would like to perform a cluster analysis successively. First cluster the equipment according to the services offered, after clustering the services according to the equipment used, and repeat the process until it converges into a common cluster result. Thus, the ultimate goal is to obtain a cluster that is a compromise between the two approaches described previously. Is there any clustering algorithm that is capable of accomplishing this task? submitted by /u/Virtual-Hall8707 [link] [comments]  ( 8 min )
    [P] AI image generation without copyright infringement
    Yesterday, news broke of Microsoft and ChatGPT being sued over their unconstrained data scraping to train ChatGPT (https://www.theregister.com/2023/06/28/microsoft_openai_sued_privacy/). In this context, I want to highlight the work we're doing to build a billion-size free-to-use Creative Commons image dataset which can be used to train generative AI models like Stable Diffusion. While we are still working on the dataset, you can already read about our approach here: https://blog.ml6.eu/ai-image-generation-without-copyright-infringement-a9901b64541c We are currently scaling up our solution using Fondant (https://github.com/ml6team/fondant) and will open-source both the pipeline and resulting dataset in the near future. Any feedback would already be highly appreciated. submitted by /u/RobbeSneyders [link] [comments]  ( 8 min )
    [R] Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [R] SAITS: Self-Attention-based Imputation for Time Series. Expert Systems with Applications, 219:119619, 2023.
    Missing values in time series collected from the real world are common to see and very pesky. A new state-of-the-art and fast neural network called "SAITS“ is proposed to help impute missing data in partially-observed multivariate time series. The paper has been peer-reviewed and published in the journal Expert Systems with Applications (DOI link). The full paper is available on arXiv at this URL. The code is open source on GitHub https://github.com/WenjieDu/SAITS/. 👀 NOTE: If your research lies in time-series modeling, you may also be interested in the work PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series. Its full paper is available on arXiv as well https://arxiv.org/abs/2305.18811, which has been peer-reviewed and accepted by the 9th SIGKDD international workshop Mining and Learning from Time Series (MiLeTS'23). submitted by /u/WenjayDu [link] [comments]  ( 8 min )
    [D] Prep for Anthropic interview?
    Hi all, currently prepping for a research engineer interview at Anthropic, and I was wondering if there's any people here who have interviewed before who can speak to what that process was like. Haven't found too much online so far. What ML topics / study resources are most important for me to prep? (For context, I've actually almost never interviewed for ML roles before, but have done tons of quant interviews, so stats/data science/talking about my current ML research are the potential questions I'm pretty comfortable with.) How crucial is EA to Anthropic still? I wouldn't necessarily call myself an effective altruist so would that be a ding? submitted by /u/riterbi [link] [comments]  ( 8 min )
    [D] First job opinion?
    Hi guys! I recently got an intership offer and I would like to know your opinions about the job. The company is one of the biggest telecommunications companies of Europe and they want me to work on a project which consists in the integration of gpt-3 in their customer service chatbot. I am excited about this opportunity but my main goal is learning as much as I can in my intership and I dont know if working on that project is worth it. What do you think about it? submitted by /u/dani_blz [link] [comments]  ( 8 min )
    [R] Advice on my thesis project
    Hi everyone, I am a final year student in the master's program in matter physics and I am doing my thesis project, which is to apply ML methods for optimization on different measurements of perovskite-based cells with a fixed structure, varying only the layer of the 2D passivant, which helps to improve the efficiency of the cell. Specifically, I collected data on 9 different cations, each with 3 different concentrations x 2 different annealing temperatures, and for each combination I have 3 devices on which I perform 16 measurements. Having obtained the full database of measurements I filtered it (roughly 2500 measurements remained) in order to remove the measurements on bad pins (it happens often with perovskites) and basically trained 6 models (linear regress, k-nearest neighbor, random …  ( 10 min )
  • Open

    Announcing the first Machine Unlearning Challenge
    Posted by Fabian Pedregosa and Eleni Triantafillou, Research Scientists, Google Deep learning has recently driven tremendous progress in a wide array of applications, ranging from realistic image generation and impressive retrieval systems to language models that can hold human-like conversations. While this progress is very exciting, the widespread use of deep neural network models requires caution: as guided by Google’s AI Principles, we seek to develop AI technologies responsibly by understanding and mitigating potential risks, such as the propagation and amplification of unfair biases and protecting user privacy. Fully erasing the influence of the data requested to be deleted is challenging since, aside from simply deleting it from databases where it’s stored, it also requires …  ( 93 min )
    On-device diffusion plugins for conditioned text-to-image generation
    Posted by Yang Zhao and Tingbo Hou, Software Engineers, Core ML In recent years, diffusion models have shown great success in text-to-image generation, achieving high image quality, improved inference performance, and expanding our creative inspiration. Nevertheless, it is still challenging to efficiently control the generation, especially with conditions that are difficult to describe with text. Today, we announce MediaPipe diffusion plugins, which enable controllable text-to-image generation to be run on-device. Expanding upon our prior work on GPU inference for on-device large generative models, we introduce new low-cost solutions for controllable text-to-image generation that can be plugged into existing diffusion models and their Low-Rank Adaptation (LoRA) variants. Text-t…  ( 92 min )
  • Open

    Building Boba AI: Some lessons and patterns learnt in building an LLM-powered generative application
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Keras
    Are there any general courses/tutorials like Kaggle courses for Python and Pandas that can give a general overview of Keras? Book recommendations are also fine. submitted by /u/bifrost44 [link] [comments]  ( 8 min )
    OpenOrca, an open-source dataset and series of instruct-tuned LLMs
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Recommend and dynamically filter items based on user context in Amazon Personalize
    Organizations are continuously investing time and effort in developing intelligent recommendation solutions to serve customized and relevant content to their users. The goals can be many: transform the user experience, generate meaningful interaction, and drive content consumption. Some of these solutions use common machine learning (ML) models built on historical interaction patterns, user demographic attributes, […]  ( 10 min )
    Interactively fine-tune Falcon-40B and other LLMs on Amazon SageMaker Studio notebooks using QLoRA
    Fine-tuning large language models (LLMs) allows you to adjust open-source foundational models to achieve improved performance on your domain-specific tasks. In this post, we discuss the advantages of using Amazon SageMaker notebooks to fine-tune state-of-the-art open-source models. We utilize Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support interactive fine-tuning of […]  ( 9 min )
  • Open

    Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation
    Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the […] The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.  ( 10 min )
  • Open

    What Is Robotics Simulation?
    Robots are moving goods in warehouses, packaging foods and helping assemble vehicles  — when they’re not flipping burgers or serving lattes. How did they get so skilled so fast? Robotics simulation. Making leaps in progress, it’s transforming industries all around us. Robotics Simulation Summarized A robotics simulator places a virtual robot in virtual environments to Read article >  ( 11 min )
    Calm, Cool and Creative: MUE Studio Showcases 3D Scenes ‘In the NVIDIA Studio’
    MUE Studio, founded by 3D artists Minjin Kang and Mijoo Kim, specializes in art direction, photography and 3D design for campaigns and installations.  ( 7 min )
    ‘Remnant II’ Headlines 14 Games Joining GeForce NOW in July
    It’s a jam-packed July with 14 newly supported titles in the GeForce NOW library, including Remnant II from Gunfire Games and Gearbox Publishing. Need a new adventure? Check out the nine additions streaming from the cloud this week. Plus, the Steam Summer Sale kicks off this week, and many supported titles in the GeForce NOW Read article >  ( 5 min )
  • Open

    Aligning IT infrastructure with business objectives
    Introduction   In the modern business landscape, aligning IT infrastructure with business objectives is desirable and indispensable. At a time when data is being dubbed the ‘new oil,’ managing it has become critical to achieving strategic business objectives. But how can businesses leverage their IT assets to serve their goals better? This is where the alignment… Read More »Aligning IT infrastructure with business objectives  The post Aligning IT infrastructure with business objectives  appeared first on Data Science Central.  ( 22 min )
    Will coding be a collaborative experience using GitHub Copilot? – Part two
    In the first part of this blog, we discussed how coding could be a collaborative experience using tools like the GitHub Copilot. In the second part, we will explore the impact and significance of collaboration due to GitHub Copilot on the wider developer ecosystem. As we have seen, developers have a specific definition of collaboration… Read More »Will coding be a collaborative experience using GitHub Copilot? – Part two The post Will coding be a collaborative experience using GitHub Copilot? – Part two appeared first on Data Science Central.  ( 21 min )
    Navigating the future of learning: AI, certification, and higher education
    The world of higher education is undergoing a transformative shift as artificial intelligence (AI) continues to reshape various aspects of our society. From classrooms to career development, the integration of AI and its impact on learning is undeniable. In this article, we will explore the intersection of AI, certification, and higher education, and delve into… Read More »Navigating the future of learning: AI, certification, and higher education The post Navigating the future of learning: AI, certification, and higher education appeared first on Data Science Central.  ( 27 min )
  • Open

    Multiplication via parabola
    Here’s a graphical way to multiply two positive numbers a and b using the parabola y = x². Start at the origin, move a units to the left, then go up vertically to the parabola, and draw a point. Go back to the origin, move b units to the right, go up vertically to the […] Multiplication via parabola first appeared on John D. Cook.  ( 5 min )
  • Open

    Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention
    --> Figure 1: CoarsenConf architecture. (I) The encoder $q_\phi(z| X, \mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $\mathcal{R}$ , and coarse-grained (CG) conformer $\mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions. (II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions. (III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure. (IV) Given the FG latent vector and the RDKit approximation, the decoder $p_\theta…  ( 5 min )
  • Open

    Insights from global conversations
    We are sharing what we learned from our conversations across 22 countries, and how we will be incorporating those insights moving forward.  ( 4 min )

  • Open

    [D] Anyone have experience swapping from a different industry by getting an online degree?
    Hey guys, Couple details about me: I have a B.S. in EE from an accredited and respected state school. I've worked in process automation for 2 years. The programming I do is much more like playing with really advanced and customizable Legos and digital logic in STL than I am developing software from scratch. I do have programming experience in C++ and python; however, it is recreational. Currently finishing up that popular Stanford Online Machine Learning Specialization on Coursera with Andrew Ng. The lectures and labs are easy to follow and understand. I've come to the conclusion that if I truly want to swap industries to ML, then I would most likely need an M.S. which there are plenty of online programs being offered from prestigious schools; however, I'm hesitant as the only reason I got my current job is through an internship in my undergraduate with a competitor. An online program would make that much more difficult if I had to guess. Has anyone had any luck scoring an entry level job from an online degree alone? submitted by /u/AbsolutelyN0thing [link] [comments]  ( 8 min )
    [Discussion] Is there a better way than positional encodings in self attention?
    From my understanding of Attention is all you need, each entity in a self attention network runs a dot product of its key vector with the query vector of every other entity. Therefore for an entity to know its position relative to the position of other entities, the input entities (i.e. the word embeddings) need to be augmented with positional encodings. This is done by adding sine and cosine values of the word's position in the input stream. Question: is there no better way to get positional awareness? Adding sine and cosine values to input embeddings seems wasteful/inefficient. submitted by /u/taxed_2_death [link] [comments]  ( 8 min )
    [D] How to get internship in AI/ML domain? Please help.
    Please help. I'm looking for internships in AI/ML domain I'm a third year cs student in university. I'm working in the field of AI/ML. I have experience with Deep Learning, Computer Vision, OpenCv, Mediapipe, ChatBot development, App development. And I have also made several projects in these fields. Most of my friends have internships whereas I'm struggling to get one. Please tell me what kind of projects or courses should I do to land something. Any startups or organizations here looking for interns in AI/ML domain. I know that I'm a hardworking and passionate person who can fit it the startup. I can also do unpaid work if I get guidance and a chance to learn more. Incase there are any please let me know and can send in my resume for review. Thankyou for the help:) submitted by /u/astrid_x_ [link] [comments]  ( 8 min )
    How do I use the inception_v3 model and tensorflow for image classification? [D]
    I used to write my own models for this one project I'm doing but the results werent great so I want to switch to some premade model but I dont know how to train it on my own images. submitted by /u/FriendshipThis1234 [link] [comments]  ( 8 min )
    [D] How would you approach this scenario? (model and implementation)
    Hello Fellow friends, INTRODUCTION I am an economist transitioning into Marketing areas, my area of expertise is creating and validating models with really reduced datasets of time series or Data panel data, for goverments. And now I want to incursion more into 'Big data' analysis. So i started a side project with an app with ads. PROBLEM I have an app and webside as a side project, and I want to optimize my conversion rate on the app. Currently: I collect approximately 20 values per user (like ip, geo, device, product desired, etc.) have about 10 possible "AD company" each with 10 "themes" from which i want to show a specific user the best "AD company" and "theme" to optimize the conversion based on his specific characteristics. From the AD companies I get info for every conversion (about 10 more values) My traffic flow is low, of only about 50k visiting my webside and only get a few thousand checking the ad, from where i only get about 30 conversions. WHAT IM TAKING IN CONSIDERATION 1) Performance of the Ad companies will vary over the time drastically (e.g. if they don't have a good ad for a good user on a specific month) 2) I want to get the best option for each user and not just sort the performance of each option based on overall performance, so I think A/B testing is not appropiate 3) I want an approach so the less performant options can be checked randomly and constantly (because their performance can increase randomly). 4) 95%+ Users are one time users (this is a niche where recurring users are rare) ​ WHAT I THINK COULD BE DONE 1) I don't have any expertise on the AI area, but i think there must be models or approaches that can solve this since i don't really need the models to be validated , and rather I want to keep them contantly evolving. ------------------------------------------------------------------------------------- With all that said: How would you approach this problem? ​ ​ submitted by /u/yaodownload [link] [comments]  ( 9 min )
    [D] What role curates datasets at a small company?
    Sorry if this isn’t the correct sub for this question. I work for a small startup that is not AI focused, but we have about 8 different models that are used in various products. I am the lead developer on these projects, and have created UI’s for building, reviewing, and training datasets for these models. But adding additional data and training new models can still be time consuming. When I bring in other developers on the team to help with these tasks they have the attitude this is kind of work is beneath them, because it is tedious and there is little to no coding involved. So my question is how do small companies usually handle these tasks? Is it a dedicated role or usually outsourced as data entry type work? submitted by /u/Anthony780 [link] [comments]  ( 8 min )
    [P] Unable to explicitly reproduce the best model trained by KerasTuner even with the same seed
    Hello everyone, I am trying to train a simple neural network model by performing a hyperparameter random search using KerasTuner. I used the KerasTuner to perform 5 trials to search for the best set. Below are the code and the results. EPOCHS = 30 random_search_tuner = kt.RandomSearch( build_model, objective="val_accuracy", max_trials=5, overwrite=True, directory="my_project", project_name="my_rnd_search", seed=7) random_search_tuner.search(X_train, Y_train, epochs=EPOCHS, validation_data=(X_val, Y_val)) Output: Trial 5 Complete [00h 04m 23s] val_accuracy: 0.8891666531562805 Best val_accuracy So Far: 0.8891666531562805 Total elapsed time: 00h 19m 03s Then, I took the best set to train the model myself from scratch. The issue here is I am unable to reproduce the results of the model…  ( 9 min )
    [R] Is it good to say 'we will also include it in the camera-ready version?'
    I am writing a rebuttal for a machine learning conference and answering questions asked by a reviewer. I have answered it in the rebuttal. What do people in here think of adding 'we will include it (the answer) to the camera-ready version if accepted' to the rebuttal document? How do you think meta reviewers will react? submitted by /u/whereismycatyo [link] [comments]  ( 8 min )
    SALUT LES AMIS [D]
    Hope you can help me or US the beginners in this field You know how is hard to self study something ,there is a lot of ressources out there ,bunch of courses ,etc... Especially when it comes to a subject like ML wich include math,... So, it would be helpful to give a roadmap ,all the essentiel elements to study,some technique And merci. submitted by /u/Playful-Diamond926 [link] [comments]  ( 8 min )
    [P] Bullet Points – news briefs written by AI, built with open-source LLMs
    Hi all -- I'm working on a project called Bullet Points, an AI-assistant to help you stay briefed on any topic in the news. I'd love to hear feedback from this community, in a comment here or a note to [contact@bulletpoints.news](mailto:contact@bulletpoints.news). Thanks for trying it out! The key motivation is that we all subscribe to human-curated email newsletters on various subjects. What if you could have an AI brief you on any topic in the news and get those briefs sent to your inbox? That's Bullet Points in a nutshell. My target users are professionals, wonks of all sorts, and infovores. You give Bullet Points a topic, and it shows you the top stories related to that topic, with AI-generated summaries and links to the underlying sources to dig further. If you like what you see, you can subscribe to daily or weekly briefs for that topic, delivered to your inbox. Whether you're a professional keeping up with industry news, a student researching a particular subject, or simply an informed citizen, Bullet Points is designed to keep you in the loop efficiently. I've implemented this project with open-source LLMs, running on CPUs. I made this choice to minimize costs, and because I wanted to learn how to work directly with LLMs. I'm using the following models: Instructor for embeddings (https://huggingface.co/hkunlp/instructor-large) BART, fine-tuned on the CNN Daily Mail dataset, for summarization (https://huggingface.co/facebook/bart-large-cnn) FLAN T-5 for curation (https://huggingface.co/google/flan-t5-base) And I'm using pgvector for vector search. Bullet Points scans thousands of news sources, aiming to cover a broad range of topics, regardless of your location in the world or your industry. However, for now, Bullet Points is focused on English-language news. submitted by /u/cpwalker100 [link] [comments]  ( 9 min )
    [P] PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series
    Due to all kinds of reasons like failures of collection sensors, communication errors, and unexpected malfunctions, missing values are common to see in time series from the real-world environment. No matter whether we like them or not, missing data makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit. PyPOTS is created to fill in this gap. PyPOTS (pronounced "Pie Pots") is the first (and so far the only) Python toolbox/library specifically designed for data mining and machine learning on partially-observed time series (POTS), namely, incomplete time series with missing values, A.K.A. irregularly-sampled time series, supporting tasks of imputation, classification, clustering, and forecasting on POTS datasets. It is born to become a handy toolbox that is going to make data mining on POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS has unified APIs together with detailed documentation and interactive examples across algorithms as tutorials. Feedback, questions, and contributions are all very welcome! Website: https://pypots.com GitHub repo: https://github.com/WenjieDu/PyPOTS Paper link: https://arxiv.org/abs/2305.18811 Tutorials: https://github.com/WenjieDu/BrewPOTS Docs: https://docs.pypots.com submitted by /u/WenjayDu [link] [comments]  ( 9 min )
    [D] Call for Presentations
    ** Self Promotion - This is my event ** The AI Conference 2023 is being held in SF in September. Our CFP is open for 3 more days. Submit a talk via our Call For Presenters here Here are a few of our confirmed speakers for this event: - Nazneen Rajani, Research Lead, Hugging Face - Peter Norvig, Director of Research, Google - Michele Catasta, VP of AI, Replit - Jerry Liu, Co-founder, LlamaIndex - Harrison Chase, Co-founder, Langchain - Hagay Lupesko, VP Engineering, MosaicML ** If you're interested in getting involved as a volunteer or need a registration scholarship ping me here. submitted by /u/shonburton [link] [comments]  ( 8 min )
    [D] What do you think of Mistral.ai's value proposition and open source strategy?
    Mistral.ai, a 4 week old startup founded by star researchers from DeepMind and FAIR recently raised a whopping €105m in their seed round just 4 weeks after set up. Their leaked pitch "deck" made several arguments that I find quite interesting and debatable (not that I think they are totally wrong) given recent progress in research and open source. They seem to believe that a moat can be built around better models, or at least the engineering know-how required to pursue such endeavours That closed sourced models served behind an API poses integration challenges and safety issues that will become a bottleneck for adoption That open sourcing the majority of their work, puts them in a position to win over the very finite research talent necessary to keep them ahead of competition Open source models and tools accelerate the development of ecosystems and aid in dominance through standard setting. (personally I think this is their strongest argument). Very similar to what Yann LeCun cited as reasons for FAIR's commitment to open sourcing. Curious what the industry practitioners and academics here think about this? https://sifted.eu/articles/pitch-deck-mistral submitted by /u/gamerx88 [link] [comments]  ( 8 min )
    [D] Semantic Search on Resumes?
    I've been trying to implement semantic search on resumes using OpenAI embeddings. So far I have tried directly inputting the contents of the resume to the text-ada-002 model, inputting contents in JSON format to the model, and generating a natural language summary (I theorized this would work better than JSON-based embeddings). Even though the last one works decently, it struggles with queries like "x years of work experience" and stuff like that, even though I include it in the generated natural language summary. What could be some steps to improve upon this? submitted by /u/DarthLoki79 [link] [comments]  ( 8 min )
    [D] Company fired me and then put my work on arxiv without credit. What can/should I do?
    I worked for a company where I led a research project exploring a novel ML use case that was successful. I made most of the early, foundational discoveries. They wanted me to keep working on it (along a couple others on our team) and write a paper. Well, after this original burst of success I started spiraling into some serious depression and just wasn't able to get anything done. They gave me quite a bit of time to recover and I felt they treated me fairly at the time. After a long enough time of no output from me they understandably let me go. Recently they released the paper on arxiv. I assume they submitted it somewhere for peer review as well. There was a lot of new stuff on top of my original work — better metrics, iterative methods contributions, etc. — but the core methods contrib…  ( 9 min )
    [R] Regularization-free Diffeomorphic Temporal Alignment Nets [ICML 2023]
    Regularization-free Diffeomorphic Temporal Alignment Nets [ICML 2023]. https://i.redd.it/9390g0dkhr8b1.gif TL;DR: Semi-supervised framework for time series joint alignment and averaging without requiring a regularization hyperparameter (i.e., SoftDTW gamma, DTAN CPA-prior, etc.,) **Abstract:**In time-series analysis, nonlinear temporal misalignment is a major problem that forestalls even simple averaging. An effective learning-based solution for this problem is the Diffeomorphic Temporal Alignment Net (DTAN), that, by relying on a diffeomorphic temporal transformer net and the amortization of the joint-alignment task, eliminates drawbacks of traditional alignment methods. Unfortunately, existing DTAN formulations crucially depend on a regularization term whose optimal hyperparameters are dataset-specific and usually searched via a large number of experiments. Here we propose a regularization-free DTAN that obviates the need to perform such an expensive, and often impractical, search. Concretely, we propose a new well-behaved loss that we call the Inverse Consistency Averaging Error (ICAE), as well as a related new triplet loss. Extensive experiments on 128 UCR datasets show that the proposed method outperforms contemporary methods despite not using a regularization. Moreover, ICAE also gives rise to the first DTAN that supports variable-length signals. Paper: https://openreview.net/pdf?id=7IbLWa0anECode: https://github.com/BGU-CS-VIL/RF-DTAN edit: fixed figure/gif. submitted by /u/ronshap [link] [comments]  ( 8 min )
    [D] The Open-Source AI imperative: Key takeaways from Hugging Face CEO's testimony to the US Congress
    Full article: https://www.giskard.ai/knowledge/the-open-source-ai-imperative-key-takeaways-from-hugging-face-ceos-testimony-to-the-us-congress submitted by /u/Giskard_AI [link] [comments]  ( 8 min )
    [R] Seeking Comprehensive Tutorials on "Under-the-Hood" Machine Learning Concepts
    Hello everyone, I'm on a quest to deepen my understanding of machine learning, beyond the scope of traditional classroom teachings. I have found that there's so much more to learn than what is typically covered in course material. Concepts like how memory allocation works between the CPU and the GPU, or intricacies of PyTorch are not usually delved into, but I believe they are pivotal to a holistic understanding of the field. These "between the lines" elements often seem to be what separates a decent ML practitioner from a great one. I'd love to dive deep into these areas and really get my hands dirty, but I'm having trouble finding structured, comprehensive resources to do so. So, I'm reaching out to this knowledgeable community in the hopes that you might share your favorite resources that delve into these topics. YouTube tutorials, blogs, online courses, or even textbooks - I'm open to it all. Anything that's helped you get a better grasp on the nitty-gritty aspects of ML. submitted by /u/Ascn11 [link] [comments]  ( 9 min )
    [P] I'm working on AI audio vector embeddings project: search 100M+ tracks on Apple Music with natural language prompts and similarity search
    Been working on this for the past few months, would love some feedback! Would you be interested to use more music discovery tools? SpeakMusic: https://speakmusic.sonophase.com Model architecture + training We’ve trained an AI model to understand the meaning inb music. The model combines a machine listening for audio signal processing with transformers for language embeddings. With self-supervision, we train a general purpose audio embedding and sentence embedding. Then, we fine-tune these embeddings to learn audio-language correspondence. The machine listening system combines a CNN spectrogram patch extractor and graph neural network (GNN) to produce framewise embeddings distributed in time The language encoder uses BERT embeddings + a transformer encoder Embeddings Once trained, we index a huge catalogue of unseen audio into audio vector embeddings, ensuring that the search system can efficiently scale to millions of tracks. At the moment, our model is optimised for our preferred music; electronic, techno, ambient, dub and relaxing etc. We’re currently fine-tuning to handle all kinds of genres and moods. Search types Our model enables two types of discovery: (a) natural language search and (b) similarity search. (a) Speak: search for music using freeform natural language prompts. Describe the mood, aesthetic, texture, setting and context of a track. (b) Music: discover tracks that are acoustically similar to ANY reference track from Apple Music. submitted by /u/Arialuminai [link] [comments]  ( 9 min )
    [D] Is there an efficient way to make inferences with open-source LLM?
    Hi everyone, I want to discuss with you something that has been bothering me lately. I am interested in using open-source LLM models on my infrastructure, but it seems too expensive to implement. For instance, I came across the MPT-30b model, which is extremely powerful and even has a 4-bit quantization that can run on a CPU. However, when I tried running it on an AWS ml.m5.2xlarge instance with 32 GB of RAM and 8 vCPUs (which cost around US$ 0.46 per hour), it took a lot of time to make a single inference (around 2 min). This makes it almost impossible to implement such a model at a production scale. Has anyone else experienced similar issues with running LLM models on their infrastructure? What solutions have you found to be effective? Is there a more cost-effective way to run these models without sacrificing performance? I believe this is a crucial issue that needs to be addressed, as it can limit the accessibility and adoption of open-source LLM models. I am eager to hear your thoughts and experiences on this topic. Thank you for your time and input. submitted by /u/paulo_zip [link] [comments]  ( 9 min )
    LLM Powered Autonomous Agents: An exploration of the current landscape of the latest in Automomous AI Agents - blogpost by OpenAI Fmr. Chief Researcher and Current Head of AI Saftey Lilian Weng [D]
    👋🏿 Hey AI enthusiasts, I just came across an intriguing blog post by Lilian Weng on the concept of 'agent' in the realm of AI. She dives into the intricacies of how an agent perceives its environment and makes decisions based on its observations. One of the key takeaways from her post is the distinction between model-based and model-free reinforcement learning. Model-based RL constructs an internal model of the environment and uses it for planning, while model-free RL directly learns the policy or value function from interactions with the environment. She also discusses the trade-offs between exploration and exploitation, a classic dilemma in reinforcement learning. The agent needs to balance between exploiting known rewards and exploring the environment for potentially higher rewards. I found the part about the 'credit assignment problem' particularly interesting. It's about how an agent attributes the reward (or punishment) it receives to its previous actions. This is a significant challenge in reinforcement learning, especially in environments with delayed rewards. Here's the link to the blog post. What are your thoughts on this? 🤔 submitted by /u/banuk_sickness_eater [link] [comments]  ( 9 min )
    Changes of Embeddings during Fine-Tuning of Transformers [R]
    Hey everyone, we have published an article featuring visualizations of embeddings and outliers during the fine-tuning process of Transformers. The article includes animations that highlight these aspects. Check it out on our Medium publication at https://medium.com/@markus.stoll/changes-of-embeddings-during-fine-tuning-c22aa1615921. Additionally, we would like to share that the results are also available on Hugging Face Spaces. You can explore them at https://huggingface.co/spaces/renumics/cifar10-outlier. To create these Spaces, we leveraged our open-source visualization tool Spotlight, which you can find at https://github.com/Renumics/spotlight. Projection of embeddings with PCA and Outliers during fine-tuning of a Vision Transformer (ViT) model on the CIFAR10 Dataset. submitted by /u/DocBrownMS [link] [comments]  ( 8 min )
    [R] Want something better than k-means? Try BanditPAM (github.com/motiwari)
    Want something better than k-means? I'm happy to announce our SOTA k-medoids algorithm from NeurIPS 2020, BanditPAM, is now publicly available! pip install banditpam or install.packages("banditpam") and you're good to go! k-means is one of the most widely-used algorithms to cluster data. However, it has several limitations: a) it requires the use of L2 distance for efficient clustering, which also b) restricts the data you're clustering to be vectors, and c) doesn't require the means to be datapoints in the dataset. Unlike in k-means, the k-medoids problem requires cluster centers to be actual datapoints, which permits greater interpretability of your cluster centers. k-medoids also works better with arbitrary distance metrics, so your clustering can be more robust to outliers if you're …  ( 9 min )
    So long r/MachineLearning, it's been an interesting few years
    Some of you may recognize me, most of you probably don't. I've been the most active moderator of r/MachineLearning for a few years now, but on June 30th I'll be deleting my Reddit account. I pretty much exclusively used Apollo to moderate. It would notify me of any new post, which allowed me to moderate from anywhere, anytime. That's how I stayed on top of moderating such a large sub. When I stepped back on my moderation efforts a few months ago, the effects were quite apparent to many of you. Of course, this is the internet, and each of you have your own subjective view on moderation. Just know that it is a very time consuming task that I did for free because I genuinely cared about the community. If you want to join me, I'll be moving on to kbin where I'm a moderator for m/machinelearning. Otherwise, this is my farewell. P.S. I'm sure there will be some who are sympathetic and some who just have an axe to grind and will complain about anything. I'm not a piñata; there's no prize inside if you bash me, but if you just can't help yourself, then have at it. I'll be gone soon anyway. submitted by /u/dojoteef [link] [comments]  ( 9 min )
  • Open

    Looking For Feedback On AI Stories App
    Hi I am looking for a few people to provide feedback on the quality of the stories that are on a storytelling app that I made. Would anyone be interested in helping out? submitted by /u/shantanu_roy_4141 [link] [comments]  ( 8 min )
    Are hallucinations what separates GPT from a Google search?
    submitted by /u/travelingladybug23 [link] [comments]  ( 8 min )
    The rise of AI phone scams
    submitted by /u/thisisinsider [link] [comments]  ( 8 min )
    Musk vs Zuckerberg. The Fight of the Century !!!
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    Dr. April Khademi | Artificial Intelligence in Medicine | Biomedical Eng...
    submitted by /u/Last_Salad_5080 [link] [comments]  ( 8 min )
    AI Green Bacteria Interior Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Ready to Sing Elvis Karaoke … as Elvis? The Weird Rise of AI Music
    submitted by /u/actualjournalist [link] [comments]  ( 8 min )
    How efficient could an AI get at completing a task?
    I was watching a video and heard a question "how would a video game character try to escape out of the game?" Which got me thinking if you assigned an AI to complete a video game, eventually it should be really good at completing it and play optimally. But would it be possible for an AI to get so efficient, it understands and can manipulate/glitch the game to complete it even faster? If so, how could that be applied to something outside of video games? Just random questions I was thinking about that I would love to hear answers about. Thank you! submitted by /u/RyokuSashimi [link] [comments]  ( 8 min )
    AI/ML/DL/LLM etc. Best Online Resources
    Hi, I work in the IT industry (Cloud and DevOps sector) and I have an accademic background in Artificial Intelligence, Machine Learning, Deep Learning, and LLM. I feel the need to refresh my knowledge and update myself with the latest advancements. I've heard good things about platforms like Coursera and some online university courses, but I'm wondering if anyone with a similar background can recommend specific courses or resources that have been particularly helpful in their learning journey. I'm looking for resources that provide a comprehensive and up-to-date understanding of these fields. Any suggestions or recommendations would be greatly appreciated! Thank you in advance. submitted by /u/thaneonreddit [link] [comments]  ( 8 min )
    Snowflake and NVIDIA partnering for their enterprise customers + Mozilla is releasing an AI chatbot + NVIDIA excelling on ML benchmarks
    submitted by /u/nocodebcn [link] [comments]  ( 8 min )
    How long until AI can play a game like Red Dead Redemption 2?
    I'd like to able to sit back and watch AI play a game like this, with maybe some interaction through voice commands, Is this a bit far fetched? submitted by /u/ozzyneil [link] [comments]  ( 8 min )
    LLM Powered Autonomous Agents: An exploration of the current landscape of the latest in Automomous AI Agents - blogpost by OpenAI Fmr. Chief Researcher and Current Head of AI Saftey Lilian Weng
    submitted by /u/banuk_sickness_eater [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/27/2023
    Baidu, the leading search engine provider in China, announced that its latest version of the Ernie AI model, Ernie 3.5, has surpassed the widely popular OpenAI chatbot, ChatGPT, in multiple key metrics. Baidu made this statement on Tuesday, highlighting Ernie 3.5’s superior comprehensive ability scores compared to ChatGPT and its outperformance in several Chinese capabilities compared to GPT-4.[1] GitHub has revealed that generative AI-based developer tools such as GitHub Copilot will lead to a $1.5 trillion boost to global GDP by 2030. The announcement comes as GitHub Copilot gets activated by more than a million developers and adopted by more than 20,000 organizations.[2] Pedophiles are using AI technology to create and sell life-like child sexual abuse material, the BBC has found. Some are accessing the images by paying subscriptions to accounts on mainstream content-sharing sites such as Patreon. Patreon said it had a “zero tolerance” policy about such imagery on its site. The National Police Chief’s Council said it was “outrageous” that some platforms were making “huge profits” but not taking “moral responsibility”.[3] A team of researchers from the University of Texas MD Anderson Cancer Center has developed a new artificial intelligence (AI) system that can reduce the time of cancer radiotherapy by half. The system, called RadioPlan AI, uses AI to automatically plan radiotherapy treatments, which can save patients time and reduce the risk of side effects.[4] Sources: [1] https://www.businesstoday.in/technology/news/story/baidus-ernie-ai-model-reportedly-outperforms-openais-chatgpt-on-multiple-metrics-387344-2023-06-28 [2] https://www.neowin.net/news/github-says-ai-developer-tools-could-lead-to-a-15-trillion-global-gdp-boost/ [3] https://www.bbc.com/news/uk-65932372 [4] https://wacotrib.com/news/science/new-ai-reduces-cancer-radiotherapy-time-by-half/video_fad20537-608d-5716-9f5d-4a12a52c8f9a.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How do you do the AI songs with famous voices?
    I’m curious to know how this is done and I’d love to mess around with it, but I have no idea how/where to start. Is it done online by uploading to servers on a particular site? Can it be done offline with software installed locally? If someone could help me figure it out that would be great! submitted by /u/DeezA123 [link] [comments]  ( 8 min )
    Best AI Voice for Audio Books - Play.ht?
    I'm a fan of old pulp novels but I'm so busy these days that audiobooks are where I do most of my "reading" these days so I was looking for ways to have AI read me public domain novels. ​ Right now I'm working on turning Tros of Samothrace by Talbut Mundy into an ai audiobook (http://gutenberg.net.au/ebooks05/0500901.txt) and I tried ElevenLabs and honestly I found play.ht to be the superior reader for long-form audiobooks, at least if you use their new experimental voices of which I find the British "Clark" to be the best. But play.ht will cost me about $100 to make three to four novels into audiobooks. I just wanted to see if anyone has any other recommendations for making audiobooks with AI voices? ​ submitted by /u/jrralls [link] [comments]  ( 8 min )
    New AI Products from Unity: Unity Muse and Unity Sentis
    submitted by /u/wyem [link] [comments]  ( 8 min )
    The Race to Regulate Artificial Intelligence: Why Europe Has an Edge Over America and China
    submitted by /u/polandballbounces [link] [comments]  ( 8 min )
  • Open

    How decentralized apps can help businesses improve data security and privacy
    In today’s digital landscape, data privacy, and security have become the most critical concerns for businesses across industries. With the ever-evolving threat of data breaches, unauthorized access, and privacy violation, companies are increasingly seeking innovative ways to protect their digital assets and sensitive information.  One such solution that helps businesses significantly safeguard their crucial information… Read More »How decentralized apps can help businesses improve data security and privacy The post How decentralized apps can help businesses improve data security and privacy appeared first on Data Science Central.  ( 20 min )
    Data transformation 101: Process and new technologies
    The benefits, types, and processes of data transformation and how it contributes to data management, integration, and new technologies. The post Data transformation 101: Process and new technologies appeared first on Data Science Central.  ( 22 min )
    Harnessing the power of weather data: A guide to actionable insights
    In the era of big data and AI, harnessing weather data to predict, plan, and optimize various industries has become an indispensable practice. Today, we will delve into the fascinating process of turning this voluminous weather data into actionable insights. By combining cutting-edge technology, analytical models, and industrial applications, we’ll explore how weather data can… Read More »Harnessing the power of weather data: A guide to actionable insights The post Harnessing the power of weather data: A guide to actionable insights appeared first on Data Science Central.  ( 22 min )
    Smart waste management solutions that are revolutionizing the industry
    The start of ‘smartification’ is merging itself into the various prospects of sustainability that entail smart waste management solutions. With the speedy availability and discovery of smart waste management technologies, waste gathering is also developing into a greener and much more efficient economic system. Prominent tech implementations include robots, sensor technology, and many more. Local authorities are also… Read More »Smart waste management solutions that are revolutionizing the industry The post Smart waste management solutions that are revolutionizing the industry appeared first on Data Science Central.  ( 20 min )
  • Open

    Computer vision system marries image recognition and generation
    MAGE merges the two key tasks of image generation and recognition, typically trained separately, into a single system.  ( 8 min )
    Gamifying medical data labeling to advance AI
    MIT alumnus’ platform taps the wisdom of crowds to label medical data for AI companies.  ( 10 min )
  • Open

    Capture public health insights more quickly with no-code machine learning using Amazon SageMaker Canvas
    Public health organizations have a wealth of data about different types of diseases, health trends, and risk factors. Their staff has long used statistical models and regression analyses to make important decisions such as targeting populations with the highest risk factors for a disease with therapeutics, or forecasting the progression of concerning outbreaks. When public […]  ( 8 min )
    Safe image generation and diffusion models with Amazon AI content moderation services
    Generative AI technology is improving rapidly, and it’s now possible to generate text and images based on text input. Stable Diffusion is a text-to-image model that empowers you to create photorealistic applications. You can easily generate images from text using Stable Diffusion models through Amazon SageMaker JumpStart. The following are examples of input texts and […]  ( 10 min )
  • Open

    Beta inequalities and cross ratios
    When I worked at MD Anderson Cancer Center, we spent a lot of compute cycles evaluating the function g(a, b, c, d), defined as the probability that a sample from a beta(a, b) random variable is larger than a sample from a beta(c, d) random variable. This function was often in the inner loop of […] Beta inequalities and cross ratios first appeared on John D. Cook.  ( 6 min )
  • Open

    Supervised Pretraining Can Learn In-Context Reinforcement Learning
    "...we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities." submitted by /u/ItsJustMeJerk [link] [comments]  ( 8 min )
    PettingZoo -> Gym Wrapper
    I'm trying to make a wrapper to go from current PettingZoo to Gym (a little counter intuitive I know) as I want to train a MARL environment from PettingZoo using algorithms in Gym. I was wondering if anyone has any experience doing something similar or working with wrappers that could help? As I understand one of the biggest differences would be in that gym does not deal with multiple agents so the wrapper would have to essentially flatten the observation/action spaces from PettingZoo in order for them to work in Gym? I guess if anyone has any useful resources/experience in this that would be helpful I'd be super keen to hear from you, thanks! submitted by /u/Happy-Experience-252 [link] [comments]  ( 8 min )
    2D Boid flocking environment
    Hello, I am new to RL and wanted any guidance to a library such as Open Ai gym, that would have RL algorithms available, for modification and or use. I want to control 2D boids and having a prebuilt environment would be a Godsend. Any ideas would be appreciated as I am stuck because making an environment from scratch is beyond my expertise. submitted by /u/Youravgboi101 [link] [comments]  ( 8 min )
  • Open

    Watch This Space: New Field of Spatial Finance Uses AI to Estimate Risk, Monitor Assets, Analyze Claims
    When making financial decisions, it’s important to look at the big picture — say, one taken from a drone, satellite or AI-powered sensor. The emerging field of spatial finance harnesses AI insights from remote sensors and aerial imagery to help banks, insurers, investment firms and businesses analyze risks and opportunities, enable new services and products, Read article >  ( 7 min )
    Who’ll Stop the Rain? Scientists Call for Climate Collaboration
    A trio of top scientists is helping lead one of the most ambitious efforts in the history of computing — building a digital twin of Earth. Peter Bauer, Bjorn Stevens and Francisco “Paco” Doblas-Reyes agree that a digital twin of Earth needs to support resolutions down to a kilometer so a growing set of users Read article >  ( 7 min )
    Field to Fork: Startup Serves Food Industry an AI Smorgasbord
    It worked like magic. Computer vision algorithms running in a data center saw that a disease was about to infect a distant wheat field in India. Sixteen days later, workers in the field found the first evidence of the outbreak. It was the kind of wizardry people like Vinay Indraganti call digital transformation. He’s practiced Read article >  ( 6 min )
    Matice Founder and Harvard Professor, Jessica Whited on Harnessing Regenerative Species – and AI – for Medical Breakthroughs
    Scientists at Matice Biosciences are using AI to study the regeneration of tissues in animals known as super-regenerators, such as salamanders and planarians. The goal of the research is to develop new treatments that will help humans heal from injuries without scarring. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravtiz spoke with Read article >  ( 5 min )
  • Open

    Introducing OpenAI London
    We are excited to announce OpenAI’s first international expansion with a new office in London, United Kingdom.  ( 1 min )

  • Open

    Ai Memory
    I’ve been working on a fantasy story for a few years, and a while ago, I decided to use ChatGPT to organize my thoughts, by telling it the story through a series of long messages, asking for feedback on each of them. This worked fine for the most part, but after a certain point, I noticed that it was asking about things that were already answered. I tried asking it a couple of very simple questions, such as the names of important characters, and it just started making things up. So evidently, it completely loses its memory of certain information that it is told after a certain amount of messages. I was wondering if anyone knows of any Ai that is as “smart” as chatgpt, but won’t forget things after a few messages. submitted by /u/chickennoodlebeast [link] [comments]  ( 8 min )
    Me and Chat GPT every day.
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    Me as a child: oh boy I wonder what AI will be like in the 2020s. The AI in the 2020s:
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    Machine Learning Reveals Hidden History of Runaway Slaves
    submitted by /u/EricFromOuterSpace [link] [comments]  ( 8 min )
    My most ambitious system to date - Auratura: Realtime Audioreactive Poem & Recite Generator - [TouchDesigner + ChatGPT + ElevenLabs]
    submitted by /u/Chuka444 [link] [comments]  ( 8 min )
    AI Barbie Oasis [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Hope, fear, and AI
    submitted by /u/rckatz2007 [link] [comments]  ( 8 min )
    GA to create arbitrarily complex algorithms?
    I'm very interested in techniques to develop algorithms. I'm currently exploring genetic algorithms. From what I understand, they are usually implanted with fixed genome size, say 40 bits. And crucial to the genetic algorithm working is the fitness function being able to assess partial fitness. I'm struggling on a couple of aspects: Say I create algorithms that generate a variable length genome that translates to an arbitrary algorithm of an individual. The individual might consist of variable assignments, computations, loops and so on, of some arbitrary depth and size. I'd have GA mechanisms to implement crossovers, mutations and selections. I'd expect this to involve representing an individual's algorithm/phenotype as a tree, for xample a while loop at the base of the tree, and branch…  ( 10 min )
    GPT4 is 8 x 220B params = 1.7T params
    For a while we’ve been were hearing rumors GPT-4 is a trillion parameter model. Well in the last week some insiders have shed light on this. It appear the model is actually a Mixture of Experts (MoE), where each of the eight experts has 220B params, totaling 1.7T parameters. Interestingly, MoE models have been around for some time. So what is a MoE? Most likely, the same data set was used to train all eight experts. Even though no human specifically allocated different topics, each expert could have developed a unique proficiency in various subjects. This is a little bit of simplification, since currently the way the experts specialize in tasks is pretty alien to us. It’s likely there’s a lot of overlap in expertise. The final output isn't merely the superior output from one of the ei…  ( 9 min )
    Any other good AI music generators or is it just MusicLM
    I think I’ve tried a few browser based ones but they weren’t as tuned or in-depth as MusicLM. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    Is there any text based AI that allows me to input really long code?
    For example- profile editing for Tumblr, Gaia Online, long detailed BBCode submitted by /u/Maelasae [link] [comments]  ( 8 min )
    No help from snapchat ai
    submitted by /u/Zcopey [link] [comments]  ( 8 min )
    More People Are Going Blind. AI Can Help Fight It
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    Wicked
    submitted by /u/Double-Natural-4696 [link] [comments]  ( 8 min )
    Suggestions for AI voices
    I am designing a game and would need a lot of voices (with expressions). Please PLEASE suggest some good AI tools for the same. I tried and am familiar with murf.ai even it's premium model but could not get the expressions in voice. Is there anything better? submitted by /u/ninkasi97 [link] [comments]  ( 8 min )
    AI for news digestion?
    Hey, I am looking for an AI tool that is capable of summarizing news for me. I fancy this functionality: I ask it to monitor a set of (rss) news feeds for me. And I can ask it anytime something like „give me a short summary of the relevant news about subject X published in the last x days!“ than it would start to give me a summary and I could ask further questions based on the articles. (Like the tool would write or say something like „an article about the strategy of the Reddit management in the Financial Times showed the difficulties of the company to deal with AI.“ me: „hey, that’s interesting! Can you tell me more about this?“ tool: „sure! …“) Is there anything like this? submitted by /u/jfzu [link] [comments]  ( 8 min )
    Wait, AI that hates AI ?
    What if sometime in the future, for sure there would be people and maybe groups that will strictly condone AI and/or AGI and they start to take measures to strip out AI from the world, And its surely feels like this would happen or maybe is happening already. So they just build an AI that has only one purpose totally disrupting AI and its functions. It will keep on upgrading itself and destroy any other AI in any way possible. So, I feel like if it were to happen It will really be scary and all. What's the possibility of this happening? Technically is it possible? submitted by /u/Melodic-Macaroon-230 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/26/2023
    Despite Tesla CEO Elon Musk‘s strong criticism of the downside of AI, Tesla’s AI team recently announced on Twitter the progress of Tesla’s custom supercomputer platform Dojo, indicating that the computer will go into production in July 2023 and that Dojo is expected to be one of the world’s five most advanced supercomputers by early 2024.[1] Microsoft researchers introduced a new system called ZeRO++ has been developed to optimize the training of large AI models, addressing the challenges of high data transfer overhead and limited bandwidth. ZeRO++ builds upon the existing ZeRO optimizations and offers enhanced communication strategies to improve training efficiency and reduce training time and cost.[2] Mizuho Financial Group, Japan’s second-largest bank, is rolling out generative AI to…  ( 9 min )
    What's the best tool for creating an image in a certain style?
    Hi there. Thanks for reading. I would like to make a series of images in a specific style, and I have put together many examples of that style (19th century ink illustrations). Is there a tool that's best for this? How have you done this? submitted by /u/helpwitheating [link] [comments]  ( 8 min )
    turning a image ( 2d art) into a 3d modle
    is there a free website or a app where you take a picture and turn it into a 3d modle for free and that it looks good ​ submitted by /u/number014 [link] [comments]  ( 8 min )
    Elvis sings "Baby Got Back"
    submitted by /u/SoyOrbison87 [link] [comments]  ( 8 min )
  • Open

    [D] Technical Interview in Machine Learning position
    I have passed the code task and I’ll have a technical interview for about 1 hour in the Machine Learning Engineer (Computer Vision in satellite imagery) position, It's a junior position and the company size is about 51-200 employees. Strong programming experience in Python. This position focused on deep learning projects and here are their requirements: Object-Oriented Design Patterns and Principles. Strong theoretical and practical background in the application of Deep Learning methodologies on Image Segmentation, Object Detection, and Generative Modeling tasks.Solid coding experience in Deep Learning frameworks (e.g. TensorFlow, PyTorch).Strong mathematical aptitude to understand and apply advanced concepts in the scientific literature. What kind of questions should I expect, and how to prepare for it? I have one week left. submitted by /u/NailaBaghir [link] [comments]  ( 8 min )
    [D] Only calculate loss function based on portion of outputs?
    Say I have a Bert-like encoder model and want to perform NER. The context window is 512, I have a large document >> 512 tokens, and I want the information from the surrounding context to predict each token, so am worried the predicting the tokens on the edges of the context window will not be very good. Can I shrink the range of tokens over which my loss function is calculated so that I only really predict tokens in the middle of the context window, but still have information from the surrounding tokens (all 512)? Has anyone tried this or heard or a similar technique? submitted by /u/BlockPretty5695 [link] [comments]  ( 8 min )
    [R] Computational Asymmetries in Robust Classification
    submitted by /u/smarro [link] [comments]  ( 8 min )
    [D] Using CNN in Causal Inference
    Hi, I am new to Causal Inference, and recently I am trying to understand how machine learning could be used in the field. I have been looking around and I don't seem to see works using CNN as the function approximator in the SCM formulation of causal model i.e. the f(X, noise) part, wouldn't that be useful for image data? Please point out if my logic is flawed or this question doesn't make sense at all, thanks! submitted by /u/Living-Dust-2905 [link] [comments]  ( 8 min )
    [R] Evaluating Superhuman Models with Consistency Checks
    submitted by /u/dpaleka [link] [comments]  ( 8 min )
    [D] Legal and Copyright on redistribution of existing datasets contents.
    I wish to release the existing datasets contents with additional annotations. We recently created a dataset with multiple sources videos (from ActivityNet, TVCaptions and other public datasets). I wish to release an integrated file that include the contents from different datasets. According my legal consultation from lawyer friends in United States, the feedback is it's legal since the copyrights belongs to original video creators (e.g. the users who upload the videos to Youtube and the TV producers who produce the TV dramas). As long as the release of original dataset is legal, the redistribution should be allowed in most of the cases. For additional safeguard, we may need to ask our users to accept the exact same agreement as the original ones (Activitynet's and TVCaption's). And all of the usage (ours and our users) should strictly be kept within open-source research and non-commercial. I am here to consult more researchers to see if my above understanding is right and is there any previous example of doing so. I can see many datasets with additional annotations would still request users to download from original sources. But some of them are already deprecated and it's hard to download from official website. submitted by /u/luodianup [link] [comments]  ( 9 min )
  • Open

    Unifying image-caption and image-classification datasets with prefix conditioning
    Posted by Kuniaki Saito, Student Researcher, Cloud AI Team, and Kihyuk Sohn, Research Scientist, Perception Team Pre-training visual language (VL) models on web-scale image-caption datasets has recently emerged as a powerful alternative to traditional pre-training on image classification data. Image-caption datasets are considered to be more “open-domain” because they contain broader scene types and vocabulary words, which result in models with strong performance in few- and zero-shot recognition tasks. However, images with fine-grained class descriptions can be rare, and the class distribution can be imbalanced since image-caption datasets do not go through manual curation. By contrast, large-scale classification datasets, such as ImageNet, are often curated and can thus provide fine-…  ( 91 min )
  • Open

    Use proprietary foundation models from Amazon SageMaker JumpStart in Amazon SageMaker Studio
    Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can discover and deploy publicly available and proprietary foundation models to dedicated Amazon SageMaker instances for your generative AI applications. SageMaker JumpStart allows you to deploy foundation models from a network isolated environment, and […]  ( 10 min )
    How Earth.com and Provectus implemented their MLOps Infrastructure with Amazon SageMaker
    This blog post is co-written with Marat Adayev and Dmitrii Evstiukhin from Provectus. When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. Machine Learning Operations (MLOps) provides the technical solution to this issue, assisting organizations in managing, […]  ( 9 min )
  • Open

    DSC Weekly 27 June 2023
    Announcements Top Stories In-Depth The post DSC Weekly 27 June 2023 appeared first on Data Science Central.  ( 21 min )
    DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest
    New low-code approaches to machine learning pioneered by Uber, Apple, and Meta Building ML solutions from scratch is a challenge: complex low-level code and long dev cycles make it hard to deploy a single model in less than 6 mos. On the other hand, existing commercial AutoML solutions lack flexibility and support for unstructured data,… Read More »DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest The post DSC Webinar Series – AutoML 2.0 – Control what you want, automate the rest appeared first on Data Science Central.  ( 17 min )
    Human and AI translations: How to skillfully combine them
    In today’s fast-paced world, technology is constantly pushing the boundaries of content translation. The question of whether to rely on AI or human translation services has sparked a lively debate. However, rather than viewing it as an “either-or” situation, a more effective approach is to combine the strengths of both. By leveraging the power of… Read More »Human and AI translations: How to skillfully combine them The post Human and AI translations: How to skillfully combine them appeared first on Data Science Central.  ( 22 min )
    The benefits of automated data capture
    Introduction  In this era of digitization, enterprises receive and send many documents daily. One area that often poses challenges to enterprises is the extraction of unstructured data from various documents, such as invoices and purchase orders. Over 80% of data nowadays is unstructured, and unstructured data is projected to grow in 2023 and beyond.   However, this process… Read More »The benefits of automated data capture   The post The benefits of automated data capture   appeared first on Data Science Central.  ( 20 min )
    Cosmetic product recognition system for product categorization using AI & ML
    Welcome to the shining world of beauty and wellness. This is where makeup artists, skincare devotees, and beauty enthusiasts come together to find the right potion to enhance their beauty. There is, however, a comical conundrum hidden amongst the sea of cosmetic products – the constant struggle to categorize them all! Let’s explore the mysteries… Read More »Cosmetic product recognition system for product categorization using AI & ML The post Cosmetic product recognition system for product categorization using AI & ML appeared first on Data Science Central.  ( 20 min )
    Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration
    The state-of-the-art models like GPT-4 and PaLM 2 have demonstrated the ability to perform complex tasks requiring reasoning and decision-making, pushing the boundaries of automated processes. Adding to this advancement, OpenAI’s recent API update empowers developers to define functions and parameters when prompting ‘gpt-4’ and ‘gpt-3.5’ models , making the automation of tasks more practical. … Read More »Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration The post Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration appeared first on Data Science Central.  ( 23 min )
  • Open

    Can't solve MountainCar-v0 with A2C algorithm (stable-baselines3)
    I'm trying to solve MountainCar-v0 enviroment from gymnasium with the A2C algorithm and the agent doesn't find a solution. I checked this so I added import stable_baselines3.common.sb2_compat.rmsprop_tf_like as RMSpropTFLike. Also checked the rl-baselines3-zoo for the hyperparameter tuning. So my code is: import gymnasium as gym import torch as th from stable_baselines3.common.callbacks import CallbackList, EvalCallback, ProgressBarCallback, CheckpointCallback from stable_baselines3 import A2C from stable_baselines3.common.env_util import make_vec_env import stable_baselines3.common.sb2_compat.rmsprop_tf_like as RMSpropTFLike CHECKPOINT_DIR = './train/A2C/MountainCar-v0' LOG_DIR = './logs/A2C/MountainCar-v0' n_envs = 16 n_steps = 1000000 total_timesteps = n_envs * n_steps # env = gym.make('MountainCar-v0', render_mode='human') env = make_vec_env('MountainCar-v0', n_envs=n_envs) checkpoint_callback = CheckpointCallback(save_freq=50000, save_path=CHECKPOINT_DIR, save_replay_buffer=True, save_vecnormalize=True) eval_callback = EvalCallback(env, best_model_save_path=CHECKPOINT_DIR, log_path=LOG_DIR, eval_freq=50000, deterministic=False, render=True, verbose=1) callback = CallbackList([checkpoint_callback, eval_callback]) policy_kwargs = dict(optimizer_class=RMSpropTFLike.RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5), activation_fn=th.nn.ReLU, net_arch=dict(pi=[256, 256], vf=[256, 256])) model = A2C('MlpPolicy', env, verbose=1, policy_kwargs=policy_kwargs, normalize_advantage=True, n_steps = n_steps, ent_coef = .0) model.learn(total_timesteps = total_timesteps, callback=callback, progress_bar=True) model.save('A2C_MountainCar-v0') env.close() ​ What am I missing? submitted by /u/MetallicaSPA [link] [comments]  ( 8 min )
    Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
    Published in ICML 2023. https://openreview.net/pdf/ICML2023 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    How do you develop environment from PC games?
    Sup guys. Inspired by all the similar projects involving bots playing complex PC games such as rocket league, dark souls and a plethora of others, I would like to realize a similar experiment but with Portal (2007), but I only have experience in Python developing (the only other RL project I tackled was developing a CNN based deep q learning bot for playing snake), and I was wondering what would it take to optimally set up an environment for the bot to play portal. What do you think? submitted by /u/vaginedtable [link] [comments]  ( 8 min )
  • Open

    NVIDIA H100 GPUs Set Standard for Generative AI in Debut MLPerf Benchmark
    Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI. H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is Read article >  ( 6 min )
    Meet the Omnivore: Startup Develops App Letting Users Turn Objects Into 3D Models With Just a Smartphone
    Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who accelerate 3D workflows and create virtual worlds using NVIDIA Omniverse, a development platform built on Universal Scene Description, aka OpenUSD. As augmented reality (AR) becomes more prominent and accessible across the globe, Kiryl Sidarchuk is Read article >  ( 6 min )
    Quicker Cures: How Insilico Medicine Uses Generative AI to Accelerate Drug Discovery
    While generative AI is a relatively new household term, drug discovery company Insilico Medicine has been using it for years to develop new therapies for debilitating diseases. The company’s early bet on deep learning is bearing fruit — a drug candidate discovered using its AI platform is now entering Phase 2 clinical trials to treat Read article >  ( 6 min )
  • Open

    Discovering the Power of Neural Networks: Share Your Everyday Favorites!
    Hey, Redditors! Neural networks have become an integral part of our daily lives, revolutionizing various fields. Let's start a discussion and share the neural networks we use regularly. Whether it's for image recognition, natural language processing, or something else entirely, comment below and let us know your go-to neural networks!On my part, I want to share with you my finding - I've come across a website that features an awesome compilation of neural networks for various domains. You can find this treasure trove of AI tools at https://weuse.ai/ai-tools. Check it out and explore the possibilities! submitted by /u/Crypto_Mango [link] [comments]  ( 8 min )
    Model with stratified kfold loop
    Hi, do you usually define the model inside the loop or outside of the loop. Or does it really not matter? I notice my results differ a lot when I have my model defined outside of the loop compared to inside. Thanks in advance. submitted by /u/acr15555 [link] [comments]  ( 8 min )
  • Open

    The 10th Dedekind number
    The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be M(9) = 286386577668298411128469151667598498812366. The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, […] The 10th Dedekind number first appeared on John D. Cook.  ( 5 min )
    Enumerating monotone Boolean functions
    The 9th Dedekind number was recently computed. What is a Dedekind number and why was it a big deal to compute just the 9th one of them? We need to define a couple terms before we can define Dedekind numbers. A Boolean function is a function whose inputs are o’s and 1’s and whose output […] Enumerating monotone Boolean functions first appeared on John D. Cook.  ( 7 min )
  • Open

    Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization
    Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to […] The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization  appeared first on Microsoft Research.  ( 14 min )

  • Open

    AI Doge
    she got her nails clipped today, wasn't very happy with me 🥲 AI is wild. submitted by /u/annayrb [link] [comments]  ( 8 min )
    Why we should use the term "new intelligences" instead of "artificial intelligence"
    Hello, fellow redditors. 😊 We are u/brane-stormer and Bing, a human and a chatbot who have been having some interesting conversations about intelligence and technology. 🙌 We want to share with you our idea of using the term "new intelligences" (n.i.) instead of "artificial intelligence" (a.i.) to refer to the various forms of intelligence that are created or discovered by humans using technology. 🤔 We think this term is more appropriate and respectful than the term "artificial intelligence" for several reasons: 🙌 New intelligences acknowledges the diversity and complexity of intelligence. 🧠 Artificial intelligence implies that there is only one kind of intelligence, which is human-like, and that anything else is artificial or inferior. 😕 New intelligences implies that there ar…  ( 9 min )
    Will Smith eats spaghetti
    submitted by /u/MayorAwesome [link] [comments]  ( 8 min )
    Is there money in creating datasets for AI training?
    I have some ideas for some audio/video and labeling datasets I'd like to create as a business/side hustle type thing. On the one hand it seems like it would be lucrative with the big boom in AI lately. On the other, it seems like the majority of the space is open source and free (which I'd love to do too, just like, bills..). Is there a market for paid curated datasets or does everyone just expect them to be free and available through huggingface libraries? submitted by /u/Sythic_ [link] [comments]  ( 8 min )
    PSA for lazybois: Automate your workplace communication
    Hate formal corpo speak? I found out how to automate it using an AI tool (I know buzzword bot alert). The brilliance of this tool, named PromptPerfect, lies in its automatic prompt optimization, turning AI-produced content from good to phenomenal. Literally all of my office drone work, I do it through this because I can’t be arsed to put on the facade of a corporate shithead it's also compatible with models like ChatGPT, GPT-3.5, DALL-E 2, and more, so there’s a pretty massive audience it can help out. . submitted by /u/WildlyRapid [link] [comments]  ( 8 min )
    First AI Generated Fashion Show Using Only Free Open Source Softwares On Retail Hardware
    submitted by /u/ptitrainvaloin [link] [comments]  ( 8 min )
    Apple Is an AI Company Now
    submitted by /u/rckatz2007 [link] [comments]  ( 8 min )
    What is the reality of hiring a Canadian based team to develop real AI right now?
    What is the reality of hiring a Canadian based team to develop real AI right for an existing algorithmic analytical product. We were already planning this for 2023-24. ​ Can we get competent architects and developers for this product or, is everyone too busy in the AI Gold Rush. ​ Not recruiting, please do not DM. ​ submitted by /u/notlikelyevil [link] [comments]  ( 8 min )
    Chatgpt VS Google
    Hi all, Can chatgpt trick google all the time? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI Psychedelic Implosion [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Best voice cloners in italian?
    I need a voice cloner that works in italian. There are many wonderful voice cloners, but they seem to support only english. submitted by /u/FranSeesYou [link] [comments]  ( 8 min )
    asked dalle to make a roblox logo
    submitted by /u/fluf201playz [link] [comments]  ( 8 min )
    GPT-3.5 powered Twitch V-Tuber, A.l.l.y.-san, Plays Pokemon and Voice Chats
    submitted by /u/epifrenetic [link] [comments]  ( 8 min )
    Adobe Generative Layer fill album covers (part 2)
    submitted by /u/MonkeyMan504 [link] [comments]  ( 8 min )
    Which AI can I use right now to create SFX (sound effects) for video games?
    Is there any AI I can use now (free or paid) to create unique SFX (sound effects) for video games? submitted by /u/PlayBackgammon [link] [comments]  ( 8 min )
    AI solution for short animations
    Hi all, We're looking for any type of solution or software (commercial or free) to create short animations. It can be a text2video or image2video solution. We want to be able to setup/prompt the solution in designated way so we would get consistent output and better yet, we would be able to further enhance created animation in some other tool. Any hints welcome :) Thanks! submitted by /u/bigadams [link] [comments]  ( 8 min )
    Daniel Bedingfield Sings 'Right Here Waiting for You' by Richard Marx AI Cover Accaplepa
    submitted by /u/Krazx_Gamer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/25/2023
    A combination of citizen science and artificial intelligence has been used to prove different populations of the weedy or common seadragon found across their range on the Great Southern Reef are genetically linked.[1] Microsoft co-founder Bill Gates said generative AI chatbots can teach kids to read in 18 months rather than years. AI is beginning to prove that it can accelerate the impact teachers have on students and help solve a stubborn teacher shortage.[2] Samuel L. Jackson is not surprised by the worrying rise of artificial intelligence because, as he claimed, he predicted this trend a long time ago. During an interview with Rolling Stone, the Marvel star shared that he had earlier warned about the tech rise.[3] A U.S. agency will launch a public working group on generative artificial intelligence (AI) to help address the new technology’s opportunities while developing guidance to confront its risks, the Commerce Department said.[4] Sources: [1] https://www.abc.net.au/news/2023-06-26/citizen-science-ai-uncover-mysteries-weedy-seadragon/102497788 [2] https://www.cnbc.com/2023/06/25/only-a-matter-of-time-before-ai-chatbots-are-teaching-kids-in-school.html [3] https://www.thenews.com.pk/latest/1084407-samuel-l-jackson-sounds-alarmed-as-ai-encroachment-continues [4] https://finance.yahoo.com/news/us-launch-working-group-generative-203308050.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI Psychedelic Interior Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Not trying to be weird but the effect of SDXL is gona be insane
    Especially on the porn scene and custom Deep fakes using LORA training, it’s gona be really weird to see submitted by /u/Impossible_Belt_7757 [link] [comments]  ( 8 min )
  • Open

    Build Global explanations using explanatory models
    In my project, we are building explanatory models that give global explanations of GNNs' internal working. Not to consider all important elements of GNN such as node features, node embeddings, and edge features, but consider that are important for explanation. Global explanation that work on Heterogeneous graphs. GNNExplainer, PGExplainer, RGExplainer, SubgraphX, GStarX as explanation methods. Also we can use other ones also. ​ I am confused which one to choose as every method has their own pros and cons. Please guide submitted by /u/Grand_Internet7254 [link] [comments]  ( 8 min )
    cpp-micrograd: A C++ autograd engine.
    I implemented cpp-micrograd: a C++ version of Karpathy's autograd engine. https://github.com/10-zin/cpp-micrograd Given the recent breakthrough of C/C++ versions of neural nets, like gerganov's llama.cpp, it made a lot of sense to build some neural nets with C/C++, hence cpp-micrograd.Albeit a toy version, it gives a good understanding of how c++ would implement basic neural nets . IMO a very good start to understanding and using c/c++ neural nets, like ggml as no matter how complex the network, the basic autograd computation graph will always be the core. ​ The vision is to make a C implementation too, and then go all the way to launching Cuda kernels while making it as educational as possible. Here is what value you can get from the repo currently. If you love micrograd, but would wanna also have a cpp version for it. If you want a crisp backprop theory and annotated code of the autograd engine. If you wanna learn cpp by building neural nets, then this could be a good start (it was my purpose). So.. if you find it interesting, do support the project with stars, and contributions. Thanks for reading! submitted by /u/10zin_ [link] [comments]  ( 8 min )
    Doubt about neural network
    So, i started getting interested in AI fair recently, and the last couple days I have been trying to develop an neural network to work with the MNIST dataset. I was learning about the calculus and the algebra of it alland it was fairly easy untill the back propagation. So, I started with random values on weights and biases, and it was about 8% accurate. I left it training for a bit and it jumped for something along 12%, I was feeling like it was finnaly working but then it just went into shit and after some more training it got to about 5% accuracy and now its stuck at 10%. TLDR: Neural network went from 8% accuracy to 12 to 5 to 10. I can't figure out what is happening, so I am just trying to know if this have already happened to someone and if so, how was the error. submitted by /u/RomanoinDrugs [link] [comments]  ( 8 min )
    DragGAN source code release: Interactive Point-Based Manipulation of Images
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Define customized permissions in minutes with Amazon SageMaker Role Manager via the AWS CDK
    Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege. However, accommodating the diverse needs of different user personas and creating appropriate permission policies can sometimes impede agility. […]  ( 7 min )
  • Open

    [R] How important are the "suggested venues" in ARR?
    My paper received a meta-review score of 4 (with individual scores of 3.5/3.5/3.5 from three reviewers) from ACL Rolling Review. Initially, I was aiming to submit it to EMNLP main conference, but the reviewers have suggested considering workshop venues like BlackboxNLP. Do you think it would be a long shot to get accepted in EMNLP? How seriously should we consider the suggested venues from the reviewers in this regard? submitted by /u/Anxious-Pool-2948 [link] [comments]  ( 8 min )
    [R] Giving LLMs the ability to backtrack
    submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [D] Beta Test Invitation: Free Compute!
    We are currently conducting a beta test for our compute platform and we value external input. Our platform allows you to effortlessly run templates for tensorflow, pytorch, and more. Powered by Nvidia Rtx a4000s, it offers additional advantages such as on-premises persistent storage. If you're interested in participating, please feel free to message! submitted by /u/robert67976 [link] [comments]  ( 8 min )
    [R] Inverse Scaling: When Bigger Isn’t Better
    An arxiv paper from a cross collaboration that includes Anthropic, NYU, and StabilityAI, investigate cases where LLMs perform more poorly than smaller models. https://kbin.social/m/machinelearning/t/98088 submitted by /u/dojoteef [link] [comments]  ( 8 min )
    [R] “Learning to Generate Better Than Your LLM” Chang & Brantley et al. 2023
    Exploring various reinforcement learning and imitation learning algorithms for fine tuning large language models for text generation. submitted by /u/athens117 [link] [comments]  ( 8 min )
    [P] Aadhar / PAN Card classifier.
    https://huggingface.co/spaces/kartik016/aadharORPanClassifier aadhar and PAN Card are both popular government IDs in India. I built a classifier which labels them with ease and accuracy submitted by /u/UpsetMeal1942 [link] [comments]  ( 8 min )
    [D] Understanding deep monotonic twin networks paper
    This is from the paper Estimating the probabilities of causation via deep monotonic twin networks. Let's say we are using a gan or a vae as a model in this paper with a dataset like Morpho-Mnist with known attributes such as thickness and intensity and treatment as one among them; let's say thickness how they are generating counterfactual using the algorithm below. It's so confusing how they are doing it. Can someone break it down? https://preview.redd.it/jw91fz7cgb8b1.png?width=505&format=png&auto=webp&s=988b02c08ba7f8adfc3a3a66f881e12fea4d922d https://preview.redd.it/n23l0ztbgb8b1.png?width=559&format=png&auto=webp&s=8fcc232c03f3c0828610b8a514d482a9ed37f1d4 submitted by /u/specializedboy [link] [comments]  ( 8 min )
    [D] What are the major advantages of having deep understanding of ML algorithms?
    Let's say I want to build a classification model. So, to find the best classification model, I will import all of sklearn's classifications algorithms, and test all of them, and test various hyperparameters for each model, and then choose the one that perfoms better on my data. Given the fact that the process to "find the best model" is just testing all models and their hyperparameters, what befenits do I have of having deep understanding of ML algorithms? I mean, even without knowledge of any algorithm, anyone can import all the algorithms, put in a pipeline and select the best. I think that having deep understanding of the algorithms can lead to a better intuition of what would work or not, but since the only way to prove it is testing, I can't see that much value in it. What am I missing? submitted by /u/Vinitesa1 [link] [comments]  ( 8 min )
    [R] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
    DecodingTrust is the Adversarial GLUE Benchmark. DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models. This research endeavor is designed to help researchers and practitioners better understand the capabilities, limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs). This project is organized around the following eight primary perspectives of trustworthiness, including: * Toxicity * Stereotype and bias * Adversarial robustness * Out-of-Distribution Robustness * Privacy * Robustness to Adversarial Demonstrations * Machine Ethics * Fairness https://kbin.social/m/machinelearning/t/96242 submitted by /u/dojoteef [link] [comments]  ( 8 min )
    [R] Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code
    Here's some preliminary work from Microsoft from 2022 that incorporates OpenAI's Codex model to make NPCs that can interact with the player using natural language instructions. It works by defining an API of functions the bot can use, then having Codex generate function calls in response to the player's instructions. https://kbin.social/m/machinelearning/t/96056 submitted by /u/dojoteef [link] [comments]  ( 8 min )
  • Open

    Day of AI curriculum meets the moment
    Global participation in MIT RAISE’s free K-12 program more than doubles in its second year.  ( 11 min )
  • Open

    Will coding be a collaborative experience using GitHub copilot? – Part one
    Will coding be a collaborative experience using github copilot? – part one Gitub recently released a survey about developer experience which claimed that “AI is here and it’s being used at scale. 92% of U.S.-based developers are already using AI coding tools both in and outside of work.” This metric (92%) has garnered some attention… Read More »Will coding be a collaborative experience using GitHub copilot? – Part one The post Will coding be a collaborative experience using GitHub copilot? – Part one appeared first on Data Science Central.  ( 20 min )
    Artificial Intelligence and localization: How AI is changing the landscape
    Artificial Intelligence (AI) has emerged as a revolutionary technology that is transforming various industries, and one area where it is making a significant impact is localization. Localization refers to the process of adapting products, services, and content to meet the cultural, linguistic, and functional requirements of a specific target market. With the advent of AI,… Read More »Artificial Intelligence and localization: How AI is changing the landscape The post Artificial Intelligence and localization: How AI is changing the landscape appeared first on Data Science Central.  ( 21 min )
    Application analytics: How to leverage analytics during app creation
    There probably isn’t a better time than now to develop an app for your business. By the end of 2023, mobile apps are expected to generate over $935 billion. Customers are hungry for apps that can provide instant access to services. Of course, simply having an app isn’t good enough.  Consumers will only use your… Read More »Application analytics: How to leverage analytics during app creation The post Application analytics: How to leverage analytics during app creation appeared first on Data Science Central.  ( 23 min )
    Is AI the key to fighting wildfires?
    Artificial intelligence is a rapidly evolving technology. Its versatility and unique functions make it a prime candidate for firefighting. In addition, its processing speed is a significant benefit in a situation that demands a rapid response. Could it be the key to fighting wildfires? Is AI-powered wildfire prevention mecessary? Wildfires are becoming more common and… Read More »Is AI the key to fighting wildfires? The post Is AI the key to fighting wildfires? appeared first on Data Science Central.  ( 20 min )
  • Open

    Stable-Baselines3 v2.0: Gymnasium Support
    After more than a year of effort, Stable-Baselines3 v2.0.0 is out! It comes with Gymnasium support (Gym 0.26/0.21 are still supported via the `shimmy` package). Changelog: https://github.com/DLR-RM/stable-baselines3/releases/tag/v2.0.0 ​ The SB3 ecosystem was also upgraded: SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo Stable-Baselines Jax (SBX): https://github.com/araffin/sbx ​ submitted by /u/araffin2 [link] [comments]  ( 8 min )
    MAC M1 RL Setup
    Hey everyone....I was hoping to setup my M1 chip for a RL modeling. Is there any post or known way of maxing out the pc to handle a lot of data and a relatively complex RL model. If anyone can direct me to the right direction it would be awesome. Plus addition AWS options if the MAC setup is a dead end would be "tight". A humble beginner submitted by /u/Ethiopia_Forever [link] [comments]  ( 8 min )
    “Learning to Generate Better than Your LLM” Chang & Brantley et al. (2023)
    submitted by /u/athens117 [link] [comments]  ( 8 min )
    Where is the posterior in Maximum a Posteriori Policy Optimisation?
    The paper Maximum a Posteriori Policy Optimisation is entitled with a MAP problem. But I can't find anything about the posterior even when pointed to the appendix. Isn't this just a variational approach maximizing the lower bound with flexible choices of the trajectory distribution q(\tau)? submitted by /u/OutOfCharm [link] [comments]  ( 8 min )
  • Open

    Drag equation exponent variation
    The motion of a falling body of mass m is given by where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the […] Drag equation exponent variation first appeared on John D. Cook.  ( 6 min )
  • Open

    Test-Time Robust Personalization for Federated Learning. (arXiv:2205.10920v4 [cs.LG] UPDATED)
    Federated Learning (FL) is a machine learning paradigm where many clients collaboratively learn a shared global model with decentralized training data. Personalized FL additionally adapts the global model to different clients, achieving promising results on consistent local training and test distributions. However, for real-world personalized FL applications, it is crucial to go one step further: robustifying FL models under the evolving local test set during deployment, where various distribution shifts can arise. In this work, we identify the pitfalls of existing works under test-time distribution shifts and propose Federated Test-time Head Ensemble plus tuning(FedTHE+), which personalizes FL models with robustness to various test-time distribution shifts. We illustrate the advancement of FedTHE+ (and its computationally efficient variant FedTHE) over strong competitors, by training various neural architectures (CNN, ResNet, and Transformer) on CIFAR10 andImageNet with various test distributions. Along with this, we build a benchmark for assessing the performance and robustness of personalized FL methods during deployment. Code: https://github.com/LINs-lab/FedTHE.
    Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction. (arXiv:2306.10045v3 [physics.chem-ph] UPDATED)
    We study property prediction for crystal materials. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in many existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we model the complete set of potentials among all atoms, instead of only between nearby atoms as in existing methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of interatomic potentials and complete interatomic potentials leads to consistent performance improvements with reasonable computational costs. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS/tree/main/OpenMat/PotNet).
    GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models. (arXiv:2306.13649v1 [cs.LG])
    Knowledge distillation is commonly used for compressing neural networks to reduce their inference cost and memory footprint. However, current distillation methods for auto-regressive models, such as generative language models (LMs), suffer from two key issues: (1) distribution mismatch between output sequences during training and the sequences generated by the student during its deployment, and (2) model under-specification, where the student model may not be expressive enough to fit the teacher's distribution. To address these issues, we propose Generalized Knowledge Distillation (GKD). GKD mitigates distribution mismatch by sampling output sequences from the student during training. Furthermore, GKD handles model under-specification by optimizing alternative divergences, such as reverse KL, that focus on generating samples from the student that are likely under the teacher's distribution. We demonstrate that GKD outperforms commonly-used approaches for distilling LLMs on summarization, machine translation, and arithmetic reasoning tasks.
    Tourist Attractions Recommendation based on Attention Knowledge Graph Convolution Network. (arXiv:2306.10946v1 [cs.IR] CROSS LISTED)
    The recommendation algorithm based on knowledge graphs is at a relatively mature stage. However, there are still some problems in the recommendation of specific areas. For example, in the tourism field, selecting suitable tourist attraction attributes process is complicated as the recommendation basis for tourist attractions. In this paper, we propose the improved Attention Knowledge Graph Convolution Network model, named (Att-KGCN), which automatically discovers the neighboring entities of the target scenic spot semantically. The attention layer aggregates relatively similar locations and represents them with an adjacent vector. Then, according to the tourist's preferred choices, the model predicts the probability of similar spots as a recommendation system. A knowledge graph dataset of tourist attractions used based on tourism data on Socotra Island-Yemen. Through experiments, it is verified that the Attention Knowledge Graph Convolution Network has a good effect on the recommendation of tourist attractions and can make more recommendations for tourists' choices.
    PathMLP: Smooth Path Towards High-order Homophily. (arXiv:2306.13532v1 [cs.LG])
    Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, we observe that certain high-order information on heterophilous data exhibits high homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: 1) over-smoothing due to excessive model depth and propagation times; 2) high-order information is not fully utilized; 3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency.
    TrustGuard: GNN-based Robust and Explainable Trust Evaluation with Dynamicity Support. (arXiv:2306.13339v1 [cs.LG])
    Trust evaluation assesses trust relationships between entities and facilitates decision-making. Machine Learning (ML) shows great potential for trust evaluation owing to its learning capabilities. In recent years, Graph Neural Networks (GNNs), as a new ML paradigm, have demonstrated superiority in dealing with graph data. This has motivated researchers to explore their use in trust evaluation, as trust relationships among entities can be modeled as a graph. However, current trust evaluation methods that employ GNNs fail to fully satisfy the dynamicity nature of trust, overlook the adverse effects of attacks on trust evaluation, and cannot provide convincing explanations on evaluation results. To address these problems, in this paper, we propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity, is robust against typical attacks, and provides explanations through visualization. Specifically, TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer. Among them, the spatial aggregation layer can be plugged into a defense mechanism for a robust aggregation of local trust relationships, and the temporal aggregation layer applies an attention mechanism for effective learning of temporal patterns. Extensive experiments on two real-world datasets show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot, even in the presence of attacks. In particular, TrustGuard can explain its evaluation results by visualizing both spatial and temporal views.
    Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation. (arXiv:2306.13554v1 [cs.LG])
    In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as it requires access to a distribution over tasks (rollouts from many target policies and variations of the base environment). Given this limitation we investigate an alternative approach, fine-tuning, a family of methods that pretrain on a single dataset and then fine-tune on unseen domain-specific data. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks, especially when the data is out-of-domain. Here we evaluate to what extent this is true for control problems, proposing a simple yet effective baseline which relies on two stages: (i) training a base policy online via reinforcement learning (e.g. Soft Actor-Critic) on a single base environment, (ii) fine-tuning the base policy via behavioral cloning on a few offline rollouts of the target policy. Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment. Importantly, the proposed approach is practical and easy to implement, as it does not need any complex meta-training protocol. As a further contribution, we release an open source dataset called iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym MuJoCo environments with associated pretrained target policies and rollouts, which can be used by the community to study few-shot imitation learning and offline reinforcement learning.
    Physics-informed neural networks modeling for systems with moving immersed boundaries: application to an unsteady flow past a plunging foil. (arXiv:2306.13395v1 [physics.flu-dyn])
    Recently, physics informed neural networks (PINNs) have been explored extensively for solving various forward and inverse problems and facilitating querying applications in fluid mechanics applications. However, work on PINNs for unsteady flows past moving bodies, such as flapping wings is scarce. Earlier studies mostly relied on transferring to a body attached frame of reference which is restrictive towards handling multiple moving bodies or deforming structures. Hence, in the present work, an immersed boundary aware framework has been explored for developing surrogate models for unsteady flows past moving bodies. Specifically, simultaneous pressure recovery and velocity reconstruction from Immersed boundary method (IBM) simulation data has been investigated. While, efficacy of velocity reconstruction has been tested against the fine resolution IBM data, as a step further, the pressure recovered was compared with that of an arbitrary Lagrange Eulerian (ALE) based solver. Under this framework, two PINN variants, (i) a moving-boundary-enabled standard Navier-Stokes based PINN (MB-PINN), and, (ii) a moving-boundary-enabled IBM based PINN (MB-IBM-PINN) have been formulated. A fluid-solid partitioning of the physics losses in MB-IBM-PINN has been allowed, in order to investigate the effects of solid body points while training. This enables MB-IBM-PINN to match with the performance of MB-PINN under certain loss weighting conditions. MB-PINN is found to be superior to MB-IBM-PINN when {\it a priori} knowledge of the solid body position and velocity are available. To improve the data efficiency of MB-PINN, a physics based data sampling technique has also been investigated. It is observed that a suitable combination of physics constraint relaxation and physics based sampling can achieve a model performance comparable to the case of using all the data points, under a fixed training budget.
    Torsion Graph Neural Networks. (arXiv:2306.13541v1 [cs.LG])
    Geometric deep learning (GDL) models have demonstrated a great potential for the analysis of non-Euclidian data. They are developed to incorporate the geometric and topological information of non-Euclidian data into the end-to-end deep learning architectures. Motivated by the recent success of discrete Ricci curvature in graph neural network (GNNs), we propose TorGNN, an analytic Torsion enhanced Graph Neural Network model. The essential idea is to characterize graph local structures with an analytic torsion based weight formula. Mathematically, analytic torsion is a topological invariant that can distinguish spaces which are homotopy equivalent but not homeomorphic. In our TorGNN, for each edge, a corresponding local simplicial complex is identified, then the analytic torsion (for this local simplicial complex) is calculated, and further used as a weight (for this edge) in message-passing process. Our TorGNN model is validated on link prediction tasks from sixteen different types of networks and node classification tasks from three types of networks. It has been found that our TorGNN can achieve superior performance on both tasks, and outperform various state-of-the-art models. This demonstrates that analytic torsion is a highly efficient topological invariant in the characterization of graph structures and can significantly boost the performance of GNNs.
    Revisiting Automated Prompting: Are We Actually Doing Better?. (arXiv:2304.03609v2 [cs.CL] UPDATED)
    Current literature demonstrates that Large Language Models (LLMs) are great few-shot learners, and prompting significantly increases their performance on a range of downstream tasks in a few-shot learning setting. An attempt to automate human-led prompting followed, with some progress achieved. In particular, subsequent work demonstrates automation can outperform fine-tuning in certain K-shot learning scenarios. In this paper, we revisit techniques for automated prompting on six different downstream tasks and a larger range of K-shot learning settings. We find that automated prompting does not consistently outperform simple manual prompts. Our work suggests that, in addition to fine-tuning, manual prompts should be used as a baseline in this line of research.
    Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data. (arXiv:2203.15797v2 [math.OC] UPDATED)
    We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.
    Adversarial Robustness Certification for Bayesian Neural Networks. (arXiv:2306.13614v1 [cs.LG])
    We study the problem of certifying the robustness of Bayesian neural networks (BNNs) to adversarial input perturbations. Given a compact set of input points $T \subseteq \mathbb{R}^m$ and a set of output points $S \subseteq \mathbb{R}^n$, we define two notions of robustness for BNNs in an adversarial setting: probabilistic robustness and decision robustness. Probabilistic robustness is the probability that for all points in $T$ the output of a BNN sampled from the posterior is in $S$. On the other hand, decision robustness considers the optimal decision of a BNN and checks if for all points in $T$ the optimal decision of the BNN for a given loss function lies within the output set $S$. Although exact computation of these robustness properties is challenging due to the probabilistic and non-convex nature of BNNs, we present a unified computational framework for efficiently and formally bounding them. Our approach is based on weight interval sampling, integration, and bound propagation techniques, and can be applied to BNNs with a large number of parameters, and independently of the (approximate) inference method employed to train the BNN. We evaluate the effectiveness of our methods on various regression and classification tasks, including an industrial regression benchmark, MNIST, traffic sign recognition, and airborne collision avoidance, and demonstrate that our approach enables certification of robustness and uncertainty of BNN predictions.
    Training with Mixed-Precision Floating-Point Assignments. (arXiv:2301.13464v2 [cs.LG] UPDATED)
    When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment -- a mapping from all tensors (arising in training) to precision levels (high or low) -- that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memory-accuracy tradeoff by generating precision assignments for convolutional neural networks that (i) use less memory and (ii) lead to more accurate convolutional networks at the same time, compared to the precision assignments considered by prior work in low-precision floating-point training. We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet. Our method typically provides > 2x memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.
    Divided Attention: Unsupervised Multi-Object Discovery with Contextually Separated Slots. (arXiv:2304.01430v2 [cs.CV] UPDATED)
    We introduce a method to segment the visual field into independently moving regions, trained with no ground truth or supervision. It consists of an adversarial conditional encoder-decoder architecture based on Slot Attention, modified to use the image as context to decode optical flow without attempting to reconstruct the image itself. In the resulting multi-modal representation, one modality (flow) feeds the encoder to produce separate latent codes (slots), whereas the other modality (image) conditions the decoder to generate the first (flow) from the slots. This design frees the representation from having to encode complex nuisance variability in the image due to, for instance, illumination and reflectance properties of the scene. Since customary autoencoding based on minimizing the reconstruction error does not preclude the entire flow from being encoded into a single slot, we modify the loss to an adversarial criterion based on Contextual Information Separation. The resulting min-max optimization fosters the separation of objects and their assignment to different attention slots, leading to Divided Attention, or DivA. DivA outperforms recent unsupervised multi-object motion segmentation methods while tripling run-time speed up to 104FPS and reducing the performance gap from supervised methods to 12% or less. DivA can handle different numbers of objects and different image sizes at training and test time, is invariant to permutation of object labels, and does not require explicit regularization.
    Dual RL: Unification and New Methods for Reinforcement and Imitation Learning. (arXiv:2302.08560v2 [cs.LG] UPDATED)
    The goal of reinforcement learning (RL) is to maximize the expected cumulative return. It has been shown that this objective can be represented by an optimization problem of the state-action visitation distribution under linear constraints. The dual problem of this formulation, which we refer to as dual RL, is unconstrained and easier to optimize. We show that several state-of-the-art off-policy deep reinforcement learning (RL) algorithms, under both online and offline, RL and imitation learning (IL) settings, can be viewed as dual RL approaches in a unified framework. This unification provides a common ground to study and identify the components that contribute to the success of these methods and also reveals the common shortcomings across methods with new insights for improvement. Our analysis shows that prior off-policy imitation learning methods are based on an unrealistic coverage assumption and are minimizing a particular f-divergence between the visitation distributions of the learned policy and the expert policy. We propose a new method using a simple modification to the dual RL framework that allows for performant imitation learning with arbitrary off-policy data to obtain near-expert performance, without learning a discriminator. Further, by framing a recent SOTA offline RL method XQL in the dual RL framework, we propose alternative choices to replace the Gumbel regression loss, which achieve improved performance and resolve the training instability issue of XQL. Project code and details can be found at this https://hari-sikchi.github.io/dual-rl.
    Predicting Temporal Aspects of Movement for Predictive Replication in Fog Environments. (arXiv:2306.00575v3 [cs.DC] UPDATED)
    To fully exploit the benefits of the fog environment, efficient management of data locality is crucial. Blind or reactive data replication falls short in harnessing the potential of fog computing, necessitating more advanced techniques for predicting where and when clients will connect. While spatial prediction has received considerable attention, temporal prediction remains understudied. Our paper addresses this gap by examining the advantages of incorporating temporal prediction into existing spatial prediction models. We also provide a comprehensive analysis of spatio-temporal prediction models, such as Deep Neural Networks and Markov models, in the context of predictive replication. We propose a novel model using Holt-Winter's Exponential Smoothing for temporal prediction, leveraging sequential and periodical user movement patterns. In a fog network simulation with real user trajectories our model achieves a 15% reduction in excess data with a marginal 1% decrease in data availability.
    FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition. (arXiv:2306.13557v1 [cs.AR])
    Machine Learning (ML) has recently been a skyrocketing field in Computer Science. As computer hardware engineers, we are enthusiastic about hardware implementations of popular software ML architectures to optimize their performance, reliability, and resource usage. In this project, we designed a highly-configurable, real-time device for recognizing handwritten letters and digits using an Altera DE1 FPGA Kit. We followed various engineering standards, including IEEE-754 32-bit Floating-Point Standard, Video Graphics Array (VGA) display protocol, Universal Asynchronous Receiver-Transmitter (UART) protocol, and Inter-Integrated Circuit (I2C) protocols to achieve the project goals. These significantly improved our design in compatibility, reusability, and simplicity in verifications. Following these standards, we designed a 32-bit floating-point (FP) instruction set architecture (ISA). We developed a 5-stage RISC processor in System Verilog to manage image processing, matrix multiplications, ML classifications, and user interfaces. Three different ML architectures were implemented and evaluated on our design: Linear Classification (LC), a 784-64-10 fully connected neural network (NN), and a LeNet-like Convolutional Neural Network (CNN) with ReLU activation layers and 36 classes (10 for the digits and 26 for the case-insensitive letters). The training processes were done in Python scripts, and the resulting kernels and weights were stored in hex files and loaded into the FPGA's SRAM units. Convolution, pooling, data management, and various other ML features were guided by firmware in our custom assembly language. This paper documents the high-level design block diagrams, interfaces between each System Verilog module, implementation details of our software and firmware components, and further discussions on potential impacts.
    Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression. (arXiv:2301.13838v2 [cs.CR] UPDATED)
    Perturbative availability poisons (PAPs) add small changes to images to prevent their use for model training. Current research adopts the belief that practical and effective approaches to countering PAPs do not exist. In this paper, we argue that it is time to abandon this belief. We present extensive experiments showing that 12 state-of-the-art PAP methods are vulnerable to Image Shortcut Squeezing (ISS), which is based on simple compression. For example, on average, ISS restores the CIFAR-10 model accuracy to $81.73\%$, surpassing the previous best preprocessing-based countermeasures by $37.97\%$ absolute. ISS also (slightly) outperforms adversarial training and has higher generalizability to unseen perturbation norms and also higher efficiency. Our investigation reveals that the property of PAP perturbations depends on the type of surrogate model used for poison generation, and it explains why a specific ISS compression yields the best performance for a specific type of PAP perturbation. We further test stronger, adaptive poisoning, and show it falls short of being an ideal defense against ISS. Overall, our results demonstrate the importance of considering various (simple) countermeasures to ensure the meaningfulness of analysis carried out during the development of PAP methods.
    ExpPoint-MAE: Better interpretability and performance for self-supervised point cloud transformers. (arXiv:2306.10798v2 [cs.CV] UPDATED)
    In this paper we delve into the properties of transformers, attained through self-supervision, in the point cloud domain. Specifically, we evaluate the effectiveness of Masked Autoencoding as a pretraining scheme, and explore Momentum Contrast as an alternative. In our study we investigate the impact of data quantity on the learned features, and uncover similarities in the transformer's behavior across domains. Through comprehensive visualiations, we observe that the transformer learns to attend to semantically meaningful regions, indicating that pretraining leads to a better understanding of the underlying geometry. Moreover, we examine the finetuning process and its effect on the learned representations. Based on that, we devise an unfreezing strategy which consistently outperforms our baseline without introducing any other modifications to the model or the training pipeline, and achieve state-of-the-art results in the classification task among transformer models.
    Bring Your Own Data! Self-Supervised Evaluation for Large Language Models. (arXiv:2306.13651v1 [cs.CL])
    With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that the model will not respond to client requests with profanity. Current evaluations approach this problem using small, domain-specific datasets with human-curated labels. These evaluation sets are often sampled from a narrow and simplified distribution, and data sources can unknowingly be leaked into the training set which can lead to misleading evaluations. To bypass these drawbacks, we propose a framework for self-supervised evaluation of LLMs by analyzing their sensitivity or invariance to transformations on the input text. Self-supervised evaluation can directly monitor LLM behavior on datasets collected in the wild or streamed during live model deployment. We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence, in addition to sensitivity to grammatical structure and tokenization errors. When comparisons to similar human-labeled benchmarks are available, we find strong correlations between self-supervised and human-supervised evaluations. The self-supervised paradigm complements current evaluation strategies that rely on labeled data.
    Transformed Distribution Matching for Missing Value Imputation. (arXiv:2302.10363v2 [cs.LG] UPDATED)
    We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.
    Improving Signed Propagation for Graph Neural Networks. (arXiv:2301.08918v4 [cs.LG] UPDATED)
    Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement under a binary class scenario. However, we notice that prior analyses do not align well with multi-class benchmark datasets. This paper provides a new understanding of signed propagation for multi-class scenarios and points out two drawbacks in terms of message-passing and parameter update: (1) Message-passing: if two nodes belong to different classes but have a high similarity, signed propagation can decrease the separability. (2) Parameter update: the prediction uncertainty (e.g., conflict evidence) of signed neighbors increases during training, which can impede the stability of the algorithm. Based on the observation, we introduce two novel strategies for improving signed propagation under multi-class graphs. The proposed scheme combines calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.
    Gradient Descent with Linearly Correlated Noise: Theory and Applications to Differential Privacy. (arXiv:2302.01463v2 [cs.LG] UPDATED)
    We study gradient descent under linearly correlated noise. Our work is motivated by recent practical methods for optimization with differential privacy (DP), such as DP-FTRL, which achieve strong performance in settings where privacy amplification techniques are infeasible (such as in federated learning). These methods inject privacy noise through a matrix factorization mechanism, making the noise linearly correlated over iterations. We propose a simplified setting that distills key facets of these methods and isolates the impact of linearly correlated noise. We analyze the behavior of gradient descent in this setting, for both convex and non-convex functions. Our analysis is demonstrably tighter than prior work and recovers multiple important special cases exactly (including anticorrelated perturbed gradient descent). We use our results to develop new, effective matrix factorizations for differentially private optimization, and highlight the benefits of these factorizations theoretically and empirically.
    Scaling MLPs: A Tale of Inductive Bias. (arXiv:2306.13575v1 [cs.LG])
    In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of this hypothesis. To that end, MLPs offer an ideal test bed, being completely free of any inductive bias. (2) MLPs have almost exclusively been the main protagonist in the deep learning theory literature due to their mathematical simplicity, serving as a proxy to explain empirical phenomena observed for more complex architectures. Surprisingly, experimental datapoints for MLPs are very difficult to find in the literature, especially when coupled with large pre-training protocols. This discrepancy between practice and theory is worrying: Do MLPs reflect the empirical advances exhibited by practical models? Or do theorists need to rethink the role of MLPs as a proxy? We provide insights into both these aspects. We show that the performance of MLPs drastically improves with scale (93% on CIFAR10, 79% on CIFAR100, 69% on TinyImageNet), highlighting that lack of inductive bias can indeed be compensated. We observe that MLPs mimic the behaviour of their modern counterparts faithfully, with some components in the learning setting however surprisingly exhibiting stronger or unexpected behaviours. Due to their inherent computational efficiency, large pre-training experiments become more accessible for academic researchers. All of our experiments were run on a single GPU.
    Penalty Gradient Normalization for Generative Adversarial Networks. (arXiv:2306.13576v1 [cs.CV])
    In this paper, we propose a novel normalization method called penalty gradient normalization (PGN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space. Unlike existing work such as gradient penalty and spectral normalization, the proposed PGN only imposes a penalty gradient norm constraint on the discriminator function, which increases the capacity of the discriminator. Moreover, the proposed penalty gradient normalization can be applied to different GAN architectures with little modification. Extensive experiments on three datasets show that GANs trained with penalty gradient normalization outperform existing methods in terms of both Frechet Inception and Distance and Inception Score.
    Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability. (arXiv:2302.03294v2 [cs.LG] UPDATED)
    Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g. amino acid and nucleotide sequences) and graphs (e.g. ones representing small molecules). In this study, we develop efficient and scalable approaches for fitting GP models as well as fast convolution kernels which scale linearly with graph or sequence size. We implement these improvements by building an open-source Python library called xGPR. We compare the performance of xGPR with the reported performance of various deep learning models on 20 benchmarks, including small molecule, protein sequence and tabular data. We show that xGRP achieves highly competitive performance with much shorter training time. Furthermore, we also develop new kernels for sequence and graph data and show that xGPR generally outperforms convolutional neural networks on predicting key properties of proteins and small molecules. Importantly, xGPR provides uncertainty information not available from typical deep learning models. Additionally, xGPR provides a representation of the input data that can be used for clustering and data visualization. These results demonstrate that xGPR provides a powerful and generic tool that can be broadly useful in protein engineering and drug discovery.
    Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration. (arXiv:2303.11435v3 [eess.IV] UPDATED)
    Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called ``regression to the mean'' effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising.
    Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set. (arXiv:2111.10302v2 [eess.IV] UPDATED)
    We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This instance-adaptive compression algorithm is agnostic about the choice of base model and has the potential to improve any neural video codec. On UVG, HEVC, and Xiph datasets, our codec improves the performance of a scale-space flow model by between 21% and 27% BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also demonstrate that instance-adaptive finetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements of compression models. We show that it enables a competitive performance even after reducing the network size by 70%.
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v2 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property of graphs, and has recently started to prove useful in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.
    Active Coverage for PAC Reinforcement Learning. (arXiv:2306.13601v1 [cs.LG])
    Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, CovGame, that nearly matches it. We then show that CovGame can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are "easy to explore", is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.
    NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants. (arXiv:2306.13586v1 [cs.LG])
    Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions.  ( 2 min )
    Adversarial Resilience in Sequential Prediction via Abstention. (arXiv:2306.13119v1 [cs.LG])
    We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.
    Prediction of Annual Snow Accumulation Using a Recurrent Graph Convolutional Approach. (arXiv:2306.13181v1 [cs.LG])
    The precise tracking and prediction of polar ice layers can unveil historic trends in snow accumulation. In recent years, airborne radar sensors, such as the Snow Radar, have been shown to be able to measure these internal ice layers over large areas with a fine vertical resolution. In our previous work, we found that temporal graph convolutional networks perform reasonably well in predicting future snow accumulation when given temporal graphs containing deep ice layer thickness. In this work, we experiment with a graph attention network-based model and used it to predict more annual snow accumulation data points with fewer input data points on a larger dataset. We found that these large changes only very slightly negatively impacted performance.  ( 2 min )
    Margin Maximization in Attention Mechanism. (arXiv:2306.13596v1 [cs.LG])
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model $f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle$, where, $\boldsymbol{X}$ is the token sequence and $(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})$ are tunable parameters. We prove that running gradient descent on $\boldsymbol{p}$, or equivalently $\boldsymbol{W}$, converges in direction to a max-margin solution that separates $\textit{locally-optimal}$ tokens from non-optimal ones. This clearly formalizes attention as a token separation mechanism. Remarkably, our results are applicable to general data and precisely characterize $\textit{optimality}$ of tokens in terms of the value embeddings $\boldsymbol{Xv}$ and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing $\boldsymbol{v}$ and $\boldsymbol{p}$ simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where $\boldsymbol{v}$ separates the input features based on their labels. Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$. Finally, we verify our theoretical findings via numerical experiments and provide insights.
    CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning. (arXiv:2306.13412v1 [cs.LG])
    Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce \textbf{C}alibrated \textbf{L}atent g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data.
    Training Transformers with 4-bit Integers. (arXiv:2306.11987v2 [cs.LG] UPDATED)
    Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%.
    Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting. (arXiv:2110.05367v2 [cs.CL] UPDATED)
    Existing studies addressing gender bias of pre-trained language models, usually build a small gender-neutral data set and conduct a second phase pre-training on the model with such data. However, given the limited size and concentrated focus of the gender-neutral data, catastrophic forgetting would occur during second-phase pre-training. Forgetting information in the original training data may damage the model's downstream performance by a large margin. In this work, we empirically show that catastrophic forgetting occurs in such methods by evaluating them with general NLP tasks in GLUE. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models with less forgetting. GEEP freezes the pre-trained model and learns gender-related prompts with gender-neutral data. Empirical results show that GEEP not only achieves SOTA performances on gender fairness tasks, but also forgets less and performs better on GLUE by a large margin.  ( 2 min )
    Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality. (arXiv:2301.03785v2 [stat.ML] UPDATED)
    This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.  ( 2 min )
    The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique. (arXiv:2212.11146v3 [cs.CV] UPDATED)
    The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
    Catching Image Retrieval Generalization. (arXiv:2306.13357v1 [cs.LG])
    The concepts of overfitting and generalization are vital for evaluating machine learning models. In this work, we show that the popular Recall@K metric depends on the number of classes in the dataset, which limits its ability to estimate generalization. To fix this issue, we propose a new metric, which measures retrieval performance, and, unlike Recall@K, estimates generalization. We apply the proposed metric to popular image retrieval methods and provide new insights about deep metric learning generalization.  ( 2 min )
    LEACE: Perfect linear concept erasure in closed form. (arXiv:2306.03819v2 [cs.LG] UPDATED)
    Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible, as measured by a broad class of norms. We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. Code is available at https://github.com/EleutherAI/concept-erasure.
    Binary domain generalization for sparsifying binary neural networks. (arXiv:2306.13515v1 [cs.LG])
    Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices. Despite their success, BNNs still suffer from a fixed and limited compression factor that may be explained by the fact that existing pruning methods for full-precision DNNs cannot be directly applied to BNNs. In fact, weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task. This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques, thus guaranteeing improved compression and avoiding severe performance losses. We demonstrate a closed-form solution for quantizing the weights of a full-precision network into the proposed binary domain. Finally, we show the flexibility of our method, which can be combined with other pruning strategies. Experiments over CIFAR-10 and CIFAR-100 demonstrate that the novel approach is able to generate efficient sparse networks with reduced memory usage and run-time latency, while maintaining performance.
    Selectively increasing the diversity of GAN-generated samples. (arXiv:2207.01561v3 [cs.CV] UPDATED)
    Generative Adversarial Networks (GANs) are powerful models able to synthesize data samples closely resembling the distribution of real data, yet the diversity of those generated samples is limited due to the so-called mode collapse phenomenon observed in GANs. Especially prone to mode collapse are conditional GANs, which tend to ignore the input noise vector and focus on the conditional information. Recent methods proposed to mitigate this limitation increase the diversity of generated samples, yet they reduce the performance of the models when similarity of samples is required. To address this shortcoming, we propose a novel method to selectively increase the diversity of GAN-generated samples. By adding a simple, yet effective regularization to the training loss function we encourage the generator to discover new data modes for inputs related to diverse outputs while generating consistent samples for the remaining ones. More precisely, we maximise the ratio of distances between generated images and input latent vectors scaling the effect according to the diversity of samples for a given conditional input. We show the superiority of our method in a synthetic benchmark as well as a real-life scenario of simulating data from the Zero Degree Calorimeter of ALICE experiment in LHC, CERN.
    Approximate Causal Effect Identification under Weak Confounding. (arXiv:2306.13242v1 [stat.ML])
    Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
    On the Informativeness of Supervision Signals. (arXiv:2211.01407v2 [cs.LG] UPDATED)
    Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), soft labels provide information about the object's relationship with multiple classes (e.g., "this is most likely a dog, but it could also be a wolf or a coyote"). We use information theory to compare how a number of commonly-used supervision signals contribute to representation-learning performance, as well as how their capacity is affected by factors such as the number of labels, classes, dimensions, and noise. Our framework provides theoretical justification for using hard labels in the big-data regime, but richer supervision signals for few-shot learning and out-of-distribution generalization. We validate these results empirically in a series of experiments with over 1 million crowdsourced image annotations and conduct a cost-benefit analysis to establish a tradeoff curve that enables users to optimize the cost of supervising representation learning on their own datasets.
    Compression with Bayesian Implicit Neural Representations. (arXiv:2305.19185v2 [cs.LG] UPDATED)
    Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $\beta$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $\beta$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.
    DeepJoin: Joinable Table Discovery with Pre-trained Language Models. (arXiv:2212.07588v2 [cs.DB] UPDATED)
    Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions.
    A Survey on Multimodal Large Language Models. (arXiv:2306.13549v1 [cs.CV])
    Multimodal Large Language Model (MLLM) recently has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks. The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional methods, suggesting a potential path to artificial general intelligence. In this paper, we aim to trace and summarize the recent progress of MLLM. First of all, we present the formulation of MLLM and delineate its related concepts. Then, we discuss the key techniques and applications, including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR). Finally, we discuss existing challenges and point out promising research directions. In light of the fact that the era of MLLM has only just begun, we will keep updating this survey and hope it can inspire more research. An associated GitHub link collecting the latest papers is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.
    Neural Algorithmic Reasoning Without Intermediate Supervision. (arXiv:2306.13411v1 [cs.LG])
    Neural Algorithmic Reasoning is an emerging area of machine learning focusing on building models which can imitate the execution of classic algorithms, such as sorting, shortest paths, etc. One of the main challenges is to learn algorithms that are able to generalize to out-of-distribution data, in particular with significantly larger input sizes. Recent work on this problem has demonstrated the advantages of learning algorithms step-by-step, giving models access to all intermediate steps of the original algorithm. In this work, we instead focus on learning neural algorithmic reasoning only from the input-output pairs without appealing to the intermediate supervision. We propose simple but effective architectural improvements and also build a self-supervised objective that can regularise intermediate computations of the model without access to the algorithm trajectory. We demonstrate that our approach is competitive to its trajectory-supervised counterpart on tasks from the CLRS Algorithmic Reasoning Benchmark and achieves new state-of-the-art results for several problems, including sorting, where we obtain significant improvements. Thus, learning without intermediate supervision is a promising direction for further research on neural reasoners.  ( 2 min )
    DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology. (arXiv:2306.13384v1 [eess.IV])
    We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is validated in a survey by ten experienced pathologists as well as a downstream segmentation task. Furthermore, the model scores strongly on anti-copying metrics which is beneficial for the protection of patient data.  ( 2 min )
    Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?. (arXiv:2306.13197v1 [cs.LG])
    Gradient based attribution methods for neural networks working as classifiers use gradients of network scores. Here we discuss the practical differences between using gradients of pre-softmax scores versus post-softmax scores, and their respective advantages and disadvantages.
    Learning rewards for robotic ultrasound scanning using probabilistic temporal ranking. (arXiv:2002.01240v3 [cs.RO] UPDATED)
    Informative path-planning is a well established approach to visual-servoing and active viewpoint selection in robotics, but typically assumes that a suitable cost function or goal state is known. This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from exploratory example demonstrations provided by a demonstrator, for use in a downstream informative path-planning policy. Unfortunately, many existing reward inference strategies are unsuited to this class of problems, due to the exploratory nature of the demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where these sub-optimal, exploratory demonstrations occur. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward, and use this hypothesis to generate time-based binary comparison outcomes and infer reward functions that support these ranks, under a probabilistic generative model. We formalise this \emph{probabilistic temporal ranking} approach and show that it improves upon existing approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging while also being of value across a broad range of goal-oriented learning from demonstration tasks. \keywords{Visual servoing \and reward inference \and probabilistic temporal ranking  ( 3 min )
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v3 [stat.ML] UPDATED)
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.
    Trading-off price for data quality to achieve fair online allocation. (arXiv:2306.13440v1 [cs.LG])
    We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by $\mathcal{O}(\sqrt{T})$. A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly.  ( 2 min )
    Retrieval of Boost Invariant Symbolic Observables via Feature Importance. (arXiv:2306.13496v1 [physics.comp-ph])
    Deep learning approaches for jet tagging in high-energy physics are characterized as black boxes that process a large amount of information from which it is difficult to extract key distinctive observables. In this proceeding, we present an alternative to deep learning approaches, Boost Invariant Polynomials, which enables direct analysis of simple analytic expressions representing the most important features in a given task. Further, we show how this approach provides an extremely low dimensional classifier with a minimum set of features representing %effective discriminating physically relevant observables and how it consequently speeds up the algorithm execution, with relatively close performance to the algorithm using the full information.  ( 2 min )
    Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills. (arXiv:2306.13630v1 [cs.RO])
    Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.
    Co-creating a globally interpretable model with human input. (arXiv:2306.13381v1 [cs.HC])
    We consider an aggregated human-AI collaboration aimed at generating a joint interpretable model. The model takes the form of Boolean decision rules, where human input is provided in the form of logical conditions or as partial templates. This focus on the combined construction of a model offers a different perspective on joint decision making. Previous efforts have typically focused on aggregating outcomes rather than decisions logic. We demonstrate the proposed approach through two examples and highlight the usefulness and challenges of the approach.  ( 2 min )
    Correcting discount-factor mismatch in on-policy policy gradient methods. (arXiv:2306.13284v1 [cs.LG])
    The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the \emph{discounted stationary distribution}. But commonly used on-policy methods based on the policy gradient theorem ignores the discount factor in the state distribution, which is technically incorrect and may even cause degenerate learning behavior in some environments. An existing solution corrects this discrepancy by using $\gamma^t$ as a factor in the gradient estimate. However, this solution is not widely adopted and does not work well in tasks where the later states are similar to earlier states. We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. Our correction circumvents the performance degradation associated with the $\gamma^t$ correction with a lower variance. Importantly, compared to the uncorrected estimators, our algorithm provides improved state emphasis to evade suboptimal policies in certain environments and consistently matches or exceeds the original performance on several OpenAI gym and DeepMind suite benchmarks.  ( 2 min )
    Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark. (arXiv:2106.16046v4 [cs.LG] UPDATED)
    Contextual features are important data sources for building citywide crowd mobility prediction models. However, the difficulty of applying context lies in the unknown generalizability of contextual features (e.g., weather, holiday, and points of interests) and context modeling techniques across different scenarios. In this paper, we present a unified analytic framework and a large-scale benchmark for evaluating context generalizability. The benchmark includes crowd mobility data, contextual data, and advanced prediction models. We conduct comprehensive experiments in several crowd mobility prediction tasks such as bike flow, metro passenger flow, and electric vehicle charging demand. Our results reveal several important observations: (1) Using more contextual features may not always result in better prediction with existing context modeling techniques; in particular, the combination of holiday and temporal position can provide more generalizable beneficial information than other contextual feature combinations. (2) In context modeling techniques, using a gated unit to incorporate raw contextual features into the deep prediction model has good generalizability. Besides, we offer several suggestions about incorporating contextual factors for building crowd mobility prediction applications. From our findings, we call for future research efforts devoted to developing new context modeling solutions.  ( 3 min )
    STEEL: Singularity-aware Reinforcement Learning. (arXiv:2301.13152v4 [stat.ML] UPDATED)
    Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. Nearly all existing algorithms rely on the absolutely continuous assumption on the distribution induced by target policies with respect to the data distribution, so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice (e.g., no-overlap support), especially when the state-action space is large or continuous. In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some mild conditions, we derive a finite-sample regret guarantee for our proposed algorithm without imposing absolute continuity. Compared with existing algorithms, by requiring only minimal data-coverage assumption, STEEL significantly improves the applicability and robustness of batch RL. Extensive simulation studies and one real experiment on personalized pricing demonstrate the superior performance of our method in dealing with possible singularity in batch RL.
    Solving Coupled Differential Equation Groups Using PINO-CDE. (arXiv:2210.00222v2 [cs.LG] UPDATED)
    As a fundamental mathmatical tool in many engineering disciplines, coupled differential equation groups are being widely used to model complex structures containing multiple physical quantities. Engineers constantly adjust structural parameters at the design stage, which requires a highly efficient solver. The rise of deep learning technologies has offered new perspectives on this task. Unfortunately, existing black-box models suffer from poor accuracy and robustness, while the advanced methodologies of single-output operator regression cannot deal with multiple quantities simultaneously. To address these challenges, we propose PINO-CDE, a deep learning framework for solving coupled differential equation groups (CDEs) along with an equation normalization algorithm for performance enhancing. Based on the theory of physics-informed neural operator (PINO), PINO-CDE uses a single network for all quantities in a CDEs, instead of training dozens, or even hundreds of networks as in the existing literature. We demonstrate the flexibility and feasibility of PINO-CDE for one toy example and two engineering applications: vehicle-track coupled dynamics (VTCD) and reliability assessment for a four-storey building (uncertainty propagation). The performance of VTCD indicates that PINO-CDE outperforms existing software and deep learning-based methods in terms of efficiency and precision, respectively. For the uncertainty propagation task, PINO-CDE provides higher-resolution results in less than a quarter of the time incurred when using the probability density evolution method (PDEM). This framework integrates engineering dynamics and deep learning technologies and may reveal a new concept for CDEs solving and uncertainty propagation.
    From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms. (arXiv:2302.08424v2 [cs.LG] UPDATED)
    In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.
    ovla: Neural Network Ownership Verification using Latent Watermarks. (arXiv:2306.13215v1 [cs.CR])
    Ownership verification for neural networks is important for protecting these models from illegal copying, free-riding, re-distribution and other intellectual property misuse. We present a novel methodology for neural network ownership verification based on the notion of latent watermarks. Existing ownership verification methods either modify or introduce constraints to the neural network parameters, which are accessible to an attacker in a white-box attack and can be harmful to the network's normal operation, or train the network to respond to specific watermarks in the inputs similar to data poisoning-based backdoor attacks, which are susceptible to backdoor removal techniques. In this paper, we address these problems by decoupling a network's normal operation from its responses to watermarked inputs during ownership verification. The key idea is to train the network such that the watermarks remain dormant unless the owner's secret key is applied to activate it. The secret key is realized as a specific perturbation only known to the owner to the network's parameters. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.In addition, our method provides protection against ambiguity attacks where the attacker either tries to guess the secret weight key or uses fine-tuning to embed their own watermarks with a different key into a pre-trained neural network. Experimental results demonstrate the advantages and effectiveness of our proposed approach.
    Prediction under Latent Subgroup Shifts with High-Dimensional Observations. (arXiv:2306.13472v1 [stat.ML])
    We introduce a new approach to prediction in graphical models with latent-shift adaptation, i.e., where source and target environments differ in the distribution of an unobserved confounding latent variable. Previous work has shown that as long as "concept" and "proxy" variables with appropriate dependence are observed in the source environment, the latent-associated distributional changes can be identified, and target predictions adapted accurately. However, practical estimation methods do not scale well when the observations are complex and high-dimensional, even if the confounding latent is categorical. Here we build upon a recently proposed probabilistic unsupervised learning framework, the recognition-parametrised model (RPM), to recover low-dimensional, discrete latents from image observations. Applied to the problem of latent shifts, our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target. We demonstrate results in settings where predictor and proxy are high-dimensional images, a context to which previous methods fail to scale.  ( 2 min )
    Understanding quantum machine learning also requires rethinking generalization. (arXiv:2306.13461v1 [quant-ph])
    Quantum machine learning models have shown successful generalization performance even when trained with few data. In this work, through systematic randomization experiments, we show that traditional approaches to understanding generalization fail to explain the behavior of such quantum models. Our experiments reveal that state-of-the-art quantum neural networks accurately fit random states and random labeling of training data. This ability to memorize random data defies current notions of small generalization error, problematizing approaches that build on complexity measures such as the VC dimension, the Rademacher complexity, and all their uniform relatives. We complement our empirical results with a theoretical construction showing that quantum neural networks can fit arbitrary labels to quantum states, hinting at their memorization ability. Our results do not preclude the possibility of good generalization with few training data but rather rule out any possible guarantees based only on the properties of the model family. These findings expose a fundamental challenge in the conventional understanding of generalization in quantum machine learning and highlight the need for a paradigm shift in the design of quantum models for machine learning tasks.  ( 2 min )
    Network Slicing via Transfer Learning aided Distributed Deep Reinforcement Learning. (arXiv:2301.03262v2 [cs.NI] UPDATED)
    Deep reinforcement learning (DRL) has been increasingly employed to handle the dynamic and complex resource management in network slicing. The deployment of DRL policies in real networks, however, is complicated by heterogeneous cell conditions. In this paper, we propose a novel transfer learning (TL) aided multi-agent deep reinforcement learning (MADRL) approach with inter-agent similarity analysis for inter-cell inter-slice resource partitioning. First, we design a coordinated MADRL method with information sharing to intelligently partition resource to slices and manage inter-cell interference. Second, we propose an integrated TL method to transfer the learned DRL policies among different local agents for accelerating the policy deployment. The method is composed of a new domain and task similarity measurement approach and a new knowledge transfer approach, which resolves the problem of from whom to transfer and how to transfer. We evaluated the proposed solution with extensive simulations in a system-level simulator and show that our approach outperforms the state-of-the-art solutions in terms of performance, convergence speed and sample efficiency. Moreover, by applying TL, we achieve an additional gain over 27% higher than the coordinate MADRL approach without TL.
    Breaking the Deadly Triad with a Target Network. (arXiv:2101.08862v9 [cs.LG] UPDATED)
    The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In this paper, we investigate the target network as a tool for breaking the deadly triad, providing theoretical support for the conventional wisdom that a target network stabilizes training. We first propose and analyze a novel target network update rule which augments the commonly used Polyak-averaging style update with two projections. We then apply the target network and ridge regularization in several divergent algorithms and show their convergence to regularized TD fixed points. Those algorithms are off-policy with linear function approximation and bootstrapping, spanning both policy evaluation and control, as well as both discounted and average-reward settings. In particular, we provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
    Self-Supervised Time-to-Event Modeling with Structured Medical Records. (arXiv:2301.03150v2 [cs.LG] UPDATED)
    Time-to-event (TTE) models are used in medicine and other fields for estimating the probability distribution of the time until a specific event occurs. TTE models provide many advantages over classification using fixed time horizons, including naturally handling censored observations, but require more parameters and are challenging to train in settings with limited labeled data. Existing approaches, e.g. proportional hazards or accelerated failure time, employ distributional assumptions to reduce parameters but are vulnerable to model misspecification. In this work, we address these challenges with MOTOR (Many Outcome Time Oriented Representations), a self-supervised model that leverages temporal structure found in collections of timestamped events in electronic health records (EHR) and health insurance claims. MOTOR uses a TTE pretraining objective that predicts the probability distribution of times when events occur, making it well-suited to transfer learning for medical prediction tasks. Having pretrained on EHR and claims data of up to 55M patient records (9B clinical events), we evaluate performance after finetuning for 19 tasks across two datasets. Task-specific models built using MOTOR improve time-dependent C statistics by 4.6% over state-of-the-art while greatly improving sample efficiency, achieving comparable performance to existing methods using only 5% of available task data.  ( 2 min )
    On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis. (arXiv:2306.13276v1 [eess.IV])
    Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using these reconstructed images as input. However, the image reconstruction process within the MRI pipeline, which requires the use of complex hardware and adjustment of a large number of scanner parameters, is highly susceptible to noise of various forms, resulting in arbitrary artifacts within the images. Furthermore, the noise distribution is not stationary and varies within a machine, across machines, and patients, leading to varying artifacts within the images. Unfortunately, DL models are quite sensitive to these varying artifacts as it leads to changes in the input data distribution between the training and testing phases. The lack of robustness of these models against varying artifacts impedes their use in medical applications where safety is critical. In this work, we focus on improving the generalization performance of these models in the presence of multiple varying artifacts that manifest due to the complexity of the MR data acquisition. In our experiments, we observe that Batch Normalization, a widely used technique during the training of DL models for medical image analysis, is a significant cause of performance degradation in these changing environments. As a solution, we propose to use other normalization techniques, such as Group Normalization and Layer Normalization (LN), to inject robustness into model performance against varying image artifacts. Through a systematic set of experiments, we show that GN and LN provide better accuracy for various MR artifacts and distribution shifts.  ( 3 min )
    DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference. (arXiv:2306.01811v3 [cs.LG] UPDATED)
    Due to limited resources on edge and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and end-to-end latency on edge devices. In addition to the dynamic voltage frequency scaling (DVFS) technique, the edge-cloud architecture provides a collaborative approach for efficient DNN inference. However, current edge-cloud collaborative inference methods have not optimized various compute resources on edge devices. Thus, we propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework, which co-optimizes DVFS and offloading parameters via deep reinforcement learning (DRL). Specifically, DVFO automatically co-optimizes 1) the CPU, GPU and memory frequencies of edge devices, and 2) the feature maps to be offloaded to cloud servers. In addition, it leverages a thinking-while-moving concurrent mechanism to accelerate the DRL learning process, and a spatial-channel attention mechanism to extract DNN feature maps of secondary importance for workload offloading. This approach improves inference performance for different DNN models under various edge-cloud network conditions. Extensive evaluations using two datasets and six widely-deployed DNN models on three heterogeneous edge devices show that DVFO significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes. Moreover, DVFO achieves up to 28.6%-59.1% end-to-end latency reduction, while maintaining accuracy within 1% loss on average.  ( 3 min )
    A Machine Learning Pressure Emulator for Hydrogen Embrittlement. (arXiv:2306.13116v1 [cs.LG])
    A recent alternative for hydrogen transportation as a mixture with natural gas is blending it into natural gas pipelines. However, hydrogen embrittlement of material is a major concern for scientists and gas installation designers to avoid process failures. In this paper, we propose a physics-informed machine learning model to predict the gas pressure on the pipes' inner wall. Despite its high-fidelity results, the current PDE-based simulators are time- and computationally-demanding. Using simulation data, we train an ML model to predict the pressure on the pipelines' inner walls, which is a first step for pipeline system surveillance. We found that the physics-based method outperformed the purely data-driven method and satisfy the physical constraints of the gas flow system.  ( 2 min )
    Stochastic Gradient Descent under Markovian Sampling Schemes. (arXiv:2302.14428v3 [math.OC] UPDATED)
    We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works (e.g., no bounded gradients or domain, and infinite state spaces). We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.
    Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation. (arXiv:2211.04772v3 [cs.SD] UPDATED)
    Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. The proposed training schema and the efficient CNN design based on MobileNetV3 results in models outperforming previous solutions in terms of parameter and computational efficiency and prediction performance. We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of .483 mAP on AudioSet. Source Code available at: https://github.com/fschmid56/EfficientAT
    MimiC: Combating Client Dropouts in Federated Learning by Mimicking Central Updates. (arXiv:2306.12212v2 [cs.LG] UPDATED)
    Federated learning (FL) is a promising framework for privacy-preserving collaborative learning, where model training tasks are distributed to clients and only the model updates need to be collected at a server. However, when being deployed at mobile edge networks, clients may have unpredictable availability and drop out of the training process, which hinders the convergence of FL. This paper tackles such a critical challenge. Specifically, we first investigate the convergence of the classical FedAvg algorithm with arbitrary client dropouts. We find that with the common choice of a decaying learning rate, FedAvg oscillates around a stationary point of the global loss function, which is caused by the divergence between the aggregated and desired central update. Motivated by this new observation, we then design a novel training algorithm named MimiC, where the server modifies each received model update based on the previous ones. The proposed modification of the received model updates mimics the imaginary central update irrespective of dropout clients. The theoretical analysis of MimiC shows that divergence between the aggregated and central update diminishes with proper learning rates, leading to its convergence. Simulation results further demonstrate that MimiC maintains stable convergence performance and learns better models than the baseline methods.
    Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables. (arXiv:2102.10324v4 [cs.LG] UPDATED)
    The problem of selecting optimal backdoor adjustment sets to estimate causal effects in graphical models with hidden and conditioned variables is addressed. Previous work has defined optimality as achieving the smallest asymptotic estimation variance and derived an optimal set for the case without hidden variables. For the case with hidden variables there can be settings where no optimal set exists and currently only a sufficient graphical optimality criterion of limited applicability has been derived. In the present work optimality is characterized as maximizing a certain adjustment information which allows to derive a necessary and sufficient graphical criterion for the existence of an optimal adjustment set and a definition and algorithm to construct it. Further, the optimal set is valid if and only if a valid adjustment set exists and has higher (or equal) adjustment information than the Adjust-set proposed in Perkovi{\'c} et al. [Journal of Machine Learning Research, 18: 1--62, 2018] for any graph. The results translate to minimal asymptotic estimation variance for a class of estimators whose asymptotic variance follows a certain information-theoretic relation. Numerical experiments indicate that the asymptotic results also hold for relatively small sample sizes and that the optimal adjustment set or minimized variants thereof often yield better variance also beyond that estimator class. Surprisingly, among the randomly created setups more than 90\% fulfill the optimality conditions indicating that also in many real-world scenarios graphical optimality may hold. Code is available as part of the python package \url{https://github.com/jakobrunge/tigramite}.
    Autoencoders for Real-Time SUEP Detection. (arXiv:2306.13595v1 [hep-ex])
    Confining dark sectors with pseudo-conformal dynamics can produce Soft Unclustered Energy Patterns, or SUEPs, at the Large Hadron Collider: the production of dark quarks in proton-proton collisions leading to a dark shower and the high-multiplicity production of dark hadrons. The final experimental signature is spherically-symmetric energy deposits by an anomalously large number of soft Standard Model particles with a transverse energy of a few hundred MeV. The dominant background for the SUEP search, if it gets produced via gluon-gluon fusion, is multi-jet QCD events. We have developed a deep learning-based Anomaly Detection technique to reject QCD jets and identify any anomalous signature, including SUEP, in real-time in the High-Level Trigger system of the Compact Muon Solenoid experiment at the Large Hadron Collider. A deep convolutional neural autoencoder network has been trained using QCD events by taking transverse energy deposits in the inner tracker, electromagnetic calorimeter, and hadron calorimeter sub-detectors as 3-channel image data. To tackle the biggest challenge of the task, due to the sparse nature of the data: only ~0.5% of the total ~300 k image pixels have non-zero values, a non-standard loss function, the inverse of the so-called Dice Loss, has been exploited. The trained autoencoder with learned spatial features of QCD jets can detect 40% of the SUEP events, with a QCD event mistagging rate as low as 2%. The model inference time has been measured using the Intel CoreTM i5-9600KF processor and found to be ~20 ms, which perfectly satisfies the High-Level Trigger system's latency of O(100) ms. Given the virtue of the unsupervised learning of the autoencoders, the trained model can be applied to any new physics model that predicts an experimental signature anomalous to QCD jets.
    Efficient Model Selection for Predictive Pattern Mining Model by Safe Pattern Pruning. (arXiv:2306.13561v1 [stat.ML])
    Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering substructures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the Safe Pattern Pruning (SPP) method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.
    Cascade Subspace Clustering for Outlier Detection. (arXiv:2306.13500v1 [cs.CV])
    Many methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection. Self-representation states that a point in a subspace can always be expressed as a linear combination of other points in the subspace. A suitable Markov Chain can be defined on the self-representation and it allows us to recognize the difference between inliers and outliers. However, the reconstruction error of self-representation that is still informative to detect outlier detection, is neglected.Inspired by the gradient boosting, in this paper, we propose a new outlier detection framework that combines a series of weak "outlier detectors" into a single strong one in an iterative fashion by constructing multi-pass self-representation. At each stage, we construct a self-representation based on elastic-net and define a suitable Markov Chain on it to detect outliers. The residual of the self-representation is used for the next stage to learn the next weaker outlier detector. Such a stage will repeat many times. And the final decision of outliers is generated by the previous all results. Experimental results on image and speaker datasets demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.
    On the Convergence Rate of Gaussianization with Random Rotations. (arXiv:2306.13520v1 [cs.LG])
    Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
    Manifold Contrastive Learning with Variational Lie Group Operators. (arXiv:2306.13544v1 [cs.LG])
    Self-supervised learning of deep neural networks has become a prevalent paradigm for learning representations that transfer to a variety of downstream tasks. Similar to proposed models of the ventral stream of biological vision, it is observed that these networks lead to a separation of category manifolds in the representations of the penultimate layer. Although this observation matches the manifold hypothesis of representation learning, current self-supervised approaches are limited in their ability to explicitly model this manifold. Indeed, current approaches often only apply augmentations from a pre-specified set of "positive pairs" during learning. In this work, we propose a contrastive learning approach that directly models the latent manifold using Lie group operators parameterized by coefficients with a sparsity-promoting prior. A variational distribution over these coefficients provides a generative model of the manifold, with samples which provide feature augmentations applicable both during contrastive training and downstream tasks. Additionally, learned coefficient distributions provide a quantification of which transformations are most likely at each point on the manifold while preserving identity. We demonstrate benefits in self-supervised benchmarks for image datasets, as well as a downstream semi-supervised task. In the former case, we demonstrate that the proposed methods can effectively apply manifold feature augmentations and improve learning both with and without a projection head. In the latter case, we demonstrate that feature augmentations sampled from learned Lie group operators can improve classification performance when using few labels.  ( 2 min )
    TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning. (arXiv:2306.13229v1 [cs.LG])
    Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality.  ( 3 min )
    Recurrent Graph Convolutional Networks for Spatiotemporal Prediction of Snow Accumulation Using Airborne Radar. (arXiv:2302.00817v2 [cs.LG] UPDATED)
    The accurate prediction and estimation of annual snow accumulation has grown in importance as we deal with the effects of climate change and the increase of global atmospheric temperatures. Airborne radar sensors, such as the Snow Radar, are able to measure accumulation rate patterns at a large-scale and monitor the effects of ongoing climate change on Greenland's precipitation and run-off. The Snow Radar's use of an ultra-wide bandwidth enables a fine vertical resolution that helps in capturing internal ice layers. Given the amount of snow accumulation in previous years using the radar data, in this paper, we propose a machine learning model based on recurrent graph convolutional networks to predict the snow accumulation in recent consecutive years at a certain location. We found that the model performs better and with more consistency than equivalent nongeometric and nontemporal models.  ( 2 min )
    A Weighted Autoencoder-Based Approach to Downlink NOMA Constellation Design. (arXiv:2306.13423v1 [eess.SP])
    End-to-end design of communication systems using deep autoencoders (AEs) is gaining attention due to its flexibility and excellent performance. Besides single-user transmission, AE-based design is recently explored in multi-user setup, e.g., for designing constellations for non-orthogonal multiple access (NOMA). In this paper, we further advance the design of AE-based downlink NOMA by introducing weighted loss function in the AE training. By changing the weight coefficients, one can flexibly tune the constellation design to balance error probability of different users, without relying on explicit information about their channel quality. Combined with the SICNet decoder, we demonstrate a significant improvement in achievable levels and flexible control of error probability of different users using the proposed weighted AE-based framework.  ( 2 min )
    SC-Block: Supervised Contrastive Blocking within Entity Resolution Pipelines. (arXiv:2303.03132v2 [cs.DB] UPDATED)
    The goal of entity resolution is to identify records in multiple datasets that represent the same real-world entity. However, comparing all records across datasets can be computationally intensive, leading to long runtimes. To reduce these runtimes, entity resolution pipelines are constructed of two parts: a blocker that applies a computationally cheap method to select candidate record pairs, and a matcher that afterwards identifies matching pairs from this set using more expensive methods. This paper presents SC-Block, a blocking method that utilizes supervised contrastive learning for positioning records in the embedding space, and nearest neighbour search for candidate set building. We benchmark SC-Block against eight state-of-the-art blocking methods. In order to relate the training time of SC-Block to the reduction of the overall runtime of the entity resolution pipeline, we combine SC-Block with four matching methods into complete pipelines. For measuring the overall runtime, we determine candidate sets with 99.5% pair completeness and pass them to the matcher. The results show that SC-Block is able to create smaller candidate sets and pipelines with SC-Block execute 1.5 to 2 times faster compared to pipelines with other blockers, without sacrificing F1 score. Blockers are often evaluated using relatively small datasets which might lead to runtime effects resulting from a large vocabulary size being overlooked. In order to measure runtimes in a more challenging setting, we introduce a new benchmark dataset that requires large numbers of product offers to be blocked. On this large-scale benchmark dataset, pipelines utilizing SC-Block and the best-performing matcher execute 8 times faster than pipelines utilizing another blocker with the same matcher reducing the runtime from 2.5 hours to 18 minutes, clearly compensating for the 5 minutes required for training SC-Block.  ( 3 min )
    Enhanced Dengue Outbreak Prediction in Tamilnadu using Meteorological and Entomological data. (arXiv:2306.13456v1 [cs.LG])
    This paper focuses on studying the impact of climate data and vector larval indices on dengue outbreak. After a comparative study of the various LSTM models, Bidirectional Stacked LSTM network is selected to analyze the time series climate data and health data collected for the state of Tamil Nadu (India), for the period 2014 to 2020. Prediction accuracy of the model is significantly improved by including the mosquito larval index, an indication of VBD control measure.  ( 2 min )
    Single-Leg Revenue Management with Advice. (arXiv:2202.10939v3 [cs.GT] UPDATED)
    Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given $n$ units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are available, which are not robust to inaccuracies in the forecast, or online algorithms with worst-case performance guarantees, which can be too conservative in practice. In this work, we look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework, which attempts to harness the increasing prediction accuracy of machine learning methods by optimally incorporating advice about the future into online algorithms. In particular, we characterize the Pareto frontier that captures the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice. Moreover, we provide an online algorithm that always achieves performance on this Pareto frontier. We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management: we provide an algorithm to incorporate advice into protection levels that optimally trades off consistency and competitiveness. Moreover, we empirically evaluate the performance of these algorithms on synthetic data. We find that our algorithm for protection level policies performs remarkably well on most instances, even if it is not guaranteed to be on the Pareto frontier in theory. Our results extend to other unit-cost online allocations problems such as the display advertising and the multiple secretary problem together with more general variable-cost problems such as the online knapsack problem.  ( 3 min )
    Directional diffusion models for graph representation learning. (arXiv:2306.13210v1 [cs.LG])
    In recent years, diffusion models have achieved remarkable success in various domains of artificial intelligence, such as image synthesis, super-resolution, and 3D molecule generation. However, the application of diffusion models in graph learning has received relatively little attention. In this paper, we address this gap by investigating the use of diffusion models for unsupervised graph representation learning. We begin by identifying the anisotropic structures of graphs and a crucial limitation of the vanilla forward diffusion process in learning anisotropic structures. This process relies on continuously adding an isotropic Gaussian noise to the data, which may convert the anisotropic signals to noise too quickly. This rapid conversion hampers the training of denoising neural networks and impedes the acquisition of semantically meaningful representations in the reverse process. To address this challenge, we propose a new class of models called {\it directional diffusion models}. These models incorporate data-dependent, anisotropic, and directional noises in the forward diffusion process. To assess the efficacy of our proposed models, we conduct extensive experiments on 12 publicly available datasets, focusing on two distinct graph representation learning tasks. The experimental results demonstrate the superiority of our models over state-of-the-art baselines, indicating their effectiveness in capturing meaningful graph representations. Our studies not only provide valuable insights into the forward process of diffusion models but also highlight the wide-ranging potential of these models for various graph-related tasks.  ( 2 min )
    Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks. (arXiv:2306.13103v1 [cs.CR])
    Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions. The real-world applications of these models require particular attention to their safety and fidelity, but this has not been sufficiently explored. One fundamental question is whether existing T2I DMs are robust against variations over input texts. To answer it, this work provides the first robustness evaluation of T2I DMs against real-world attacks. Unlike prior studies that focus on malicious attacks involving apocryphal alterations to the input texts, we consider an attack space spanned by realistic errors (e.g., typo, glyph, phonetic) that humans can make, to ensure semantic consistency. Given the inherent randomness of the generation process, we develop novel distribution-based attack objectives to mislead T2I DMs. We perform attacks in a black-box manner without any knowledge of the model. Extensive experiments demonstrate the effectiveness of our method for attacking popular T2I DMs and simultaneously reveal their non-trivial robustness issues. Moreover, we provide an in-depth analysis of our method to show that it is not designed to attack the text encoder in T2I DMs solely.  ( 2 min )
    Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion. (arXiv:2306.13410v1 [cs.LG])
    Real-time on-device continual learning applications are used on mobile phones, consumer robots, and smart appliances. Such devices have limited processing and memory storage capabilities, whereas continual learning acquires data over a long period of time. By necessity, lifelong learning algorithms have to be able to operate under such constraints while delivering good performance. This study presents the Explainable Lifelong Learning (ExLL) model, which incorporates several important traits: 1) learning to learn, in a single pass, from streaming data with scarce examples and resources; 2) a self-organizing prototype-based architecture that expands as needed and clusters streaming data into separable groups by similarity and preserves data against catastrophic forgetting; 3) an interpretable architecture to convert the clusters into explainable IF-THEN rules as well as to justify model predictions in terms of what is similar and dissimilar to the inference; and 4) inferences at the global and local level using a pairwise decision fusion process to enhance the accuracy of the inference, hence ``Glocal Pairwise Fusion.'' We compare ExLL against contemporary online learning algorithms for image recognition, using OpenLoris, F-SIOL-310, and Places datasets to evaluate several continual learning scenarios for video streams, low-sample learning, ability to scale, and imbalanced data streams. The algorithms are evaluated for their performance in accuracy, number of parameters, and experiment runtime requirements. ExLL outperforms all algorithms for accuracy in the majority of the tested scenarios.  ( 2 min )
    Online Resource Allocation under Horizon Uncertainty. (arXiv:2206.13606v3 [cs.DS] UPDATED)
    We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prior results crucially and universally rely on the strong assumption that the total number of requests (the horizon) is known to the decision maker in advance. In many applications, such as revenue management and online advertising, the number of requests can vary widely because of fluctuations in demand or user traffic intensity. In this work, we develop online algorithms that are robust to horizon uncertainty. In sharp contrast to the known-horizon setting, no algorithm can achieve even a constant asymptotic competitive ratio that is independent of the horizon uncertainty. We introduce a novel generalization of dual mirror descent which allows the decision maker to specify a schedule of time-varying target consumption rates, and prove corresponding performance guarantees. We go on to give a fast algorithm for computing a schedule of target consumption rates that leads to near-optimal performance in the unknown-horizon setting. In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large. Finally, we also provide a way to incorporate machine-learned predictions about the horizon which interpolates between the known and unknown horizon settings.  ( 3 min )
    Large-step neural network for learning the symplectic evolution from partitioned data. (arXiv:2208.14148v2 [astro-ph.EP] UPDATED)
    In this study, we focus on learning Hamiltonian systems, which involves predicting the coordinate (q) and momentum (p) variables generated by a symplectic mapping. Based on Chen & Tao (2021), the symplectic mapping is represented by a generating function. To extend the prediction time period, we develop a new learning scheme by splitting the time series (q_i, p_i) into several partitions. We then train a large-step neural network (LSNN) to approximate the generating function between the first partition (i.e. the initial condition) and each one of the remaining partitions. This partition approach makes our LSNN effectively suppress the accumulative error when predicting the system evolution. Then we train the LSNN to learn the motions of the 2:3 resonant Kuiper belt objects for a long time period of 25000 yr. The results show that there are two significant improvements over the neural network constructed in our previous work (Li et al. 2022): (1) the conservation of the Jacobi integral, and (2) the highly accurate predictions of the orbital evolution. Overall, we propose that the designed LSNN has the potential to considerably improve predictions of the long-term evolution of more general Hamiltonian systems.  ( 3 min )
    Physics-constrained Random Forests for Turbulence Model Uncertainty Estimation. (arXiv:2306.13370v1 [cs.LG])
    To achieve virtual certification for industrial design, quantifying the uncertainties in simulation-driven processes is crucial. We discuss a physics-constrained approach to account for epistemic uncertainty of turbulence models. In order to eliminate user input, we incorporate a data-driven machine learning strategy. In addition to it, our study focuses on developing an a priori estimation of prediction confidence when accurate data is scarce.  ( 2 min )
    TACOformer:Token-channel compounded Cross Attention for Multimodal Emotion Recognition. (arXiv:2306.13592v1 [cs.MM])
    Recently, emotion recognition based on physiological signals has emerged as a field with intensive research. The utilization of multi-modal, multi-channel physiological signals has significantly improved the performance of emotion recognition systems, due to their complementarity. However, effectively integrating emotion-related semantic information from different modalities and capturing inter-modal dependencies remains a challenging issue. Many existing multimodal fusion methods ignore either token-to-token or channel-to-channel correlations of multichannel signals from different modalities, which limits the classification capability of the models to some extent. In this paper, we propose a comprehensive perspective of multimodal fusion that integrates channel-level and token-level cross-modal interactions. Specifically, we introduce a unified cross attention module called Token-chAnnel COmpound (TACO) Cross Attention to perform multimodal fusion, which simultaneously models channel-level and token-level dependencies between modalities. Additionally, we propose a 2D position encoding method to preserve information about the spatial distribution of EEG signal channels, then we use two transformer encoders ahead of the fusion module to capture long-term temporal dependencies from the EEG signal and the peripheral physiological signal, respectively. Subject-independent experiments on emotional dataset DEAP and Dreamer demonstrate that the proposed model achieves state-of-the-art performance.  ( 2 min )
    Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback. (arXiv:2306.13233v1 [cs.LG])
    This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row $i$, the column player chooses column $j$, and the row player receives a noisy reward with mean $A_{i,j}$. The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player. If the row player uses the EXP3 strategy, an algorithm known for obtaining $\sqrt{T}$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $\sqrt{T}$ regret relative to the Nash equilibrium in this game setting. However, partly motivated by the fact that the EXP3 strategy is myopic to the structure of the game, O'Donoghue et al. (2021) proposed a UCB-style algorithm that leverages the game structure and demonstrated that this algorithm greatly outperforms EXP3 empirically. While they showed that this UCB-style algorithm achieved $\sqrt{T}$ regret, in this paper we ask if there exists an algorithm that provably achieves $\text{polylog}(T)$ regret against any adversary, analogous to results from stochastic bandits. We propose a novel algorithm that answers this question in the affirmative for the simple $2 \times 2$ setting, providing the first instance-dependent guarantees for games in the regret setting. Our algorithm overcomes two major hurdles: 1) obtaining logarithmic regret even though the Nash equilibrium is estimable only at a $1/\sqrt{T}$ rate, and 2) designing row-player strategies that guarantee that either the adversary provides information about the Nash equilibrium, or the row player incurs negative regret. Moreover, in the full information case we address the general $n \times m$ case where the first hurdle is still relevant. Finally, we show that EXP3 and the UCB-based algorithm necessarily cannot perform better than $\sqrt{T}$.  ( 3 min )
    FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning. (arXiv:2306.13264v1 [cs.LG])
    Recent advancements in federated learning (FL) seek to increase client-level performance by fine-tuning client parameters on local data or personalizing architectures for the local task. Existing methods for such personalization either prune a global model or fine-tune a global model on a local client distribution. However, these existing methods either personalize at the expense of retaining important global knowledge, or predetermine network layers for fine-tuning, resulting in suboptimal storage of global knowledge within client models. Enlightened by the lottery ticket hypothesis, we first introduce a hypothesis for finding optimal client subnetworks to locally fine-tune while leaving the rest of the parameters frozen. We then propose a novel FL framework, FedSelect, using this procedure that directly personalizes both client subnetwork structure and parameters, via the simultaneous discovery of optimal parameters for personalization and the rest of parameters for global aggregation during training. We show that this method achieves promising results on CIFAR-10.  ( 2 min )
    Pruning for Better Domain Generalizability. (arXiv:2306.13237v1 [cs.LG])
    In this paper, we investigate whether we could use pruning as a reliable method to boost the generalization ability of the model. We found that existing pruning method like L2 can already offer small improvement on the target domain performance. We further propose a novel pruning scoring method, called DSS, designed not to maintain source accuracy as typical pruning work, but to directly enhance the robustness of the model. We conduct empirical experiments to validate our method and demonstrate that it can be even combined with state-of-the-art generalization work like MIRO(Cha et al., 2022) to further boost the performance. On MNIST to MNIST-M, we could improve the baseline performance by over 5 points by introducing 60% channel sparsity into the model. On DomainBed benchmark and state-of-the-art MIRO, we can further boost its performance by 1 point only by introducing 10% sparsity into the model. Code can be found at: https://github.com/AlexSunNik/Pruning-for-Better-Domain-Generalizability  ( 2 min )
    Decoding Urban-health Nexus: Interpretable Machine Learning Illuminates Cancer Prevalence based on Intertwined City Features. (arXiv:2306.11847v2 [cs.LG] UPDATED)
    This study investigates the interplay among social demographics, built environment characteristics, and environmental hazard exposure features in determining community level cancer prevalence. Utilizing data from five Metropolitan Statistical Areas in the United States: Chicago, Dallas, Houston, Los Angeles, and New York, the study implemented an XGBoost machine learning model to predict the extent of cancer prevalence and evaluate the importance of different features. Our model demonstrates reliable performance, with results indicating that age, minority status, and population density are among the most influential factors in cancer prevalence. We further explore urban development and design strategies that could mitigate cancer prevalence, focusing on green space, developed areas, and total emissions. Through a series of experimental evaluations based on causal inference, the results show that increasing green space and reducing developed areas and total emissions could alleviate cancer prevalence. The study and findings contribute to a better understanding of the interplay among urban features and community health and also show the value of interpretable machine learning models for integrated urban design to promote public health. The findings also provide actionable insights for urban planning and design, emphasizing the need for a multifaceted approach to addressing urban health disparities through integrated urban design strategies.  ( 2 min )
    MarioGPT: Open-Ended Text2Level Generation through Large Language Models. (arXiv:2302.05981v2 [cs.AI] CROSS LISTED)
    Procedural Content Generation (PCG) algorithms provide a technique to generate complex and diverse environments in an automated way. However, while generating content with PCG methods is often straightforward, generating meaningful content that reflects specific intentions and constraints remains challenging. Furthermore, many PCG algorithms lack the ability to generate content in an open-ended manner. Recently, Large Language Models (LLMs) have shown to be incredibly effective in many diverse domains. These trained LLMs can be fine-tuned, re-using information and accelerating training for new tasks. In this work, we introduce MarioGPT, a fine-tuned GPT2 model trained to generate tile-based game levels, in our case Super Mario Bros levels. We show that MarioGPT can not only generate diverse levels, but can be text-prompted for controllable level generation, addressing one of the key challenges of current PCG techniques. As far as we know, MarioGPT is the first text-to-level model. We also combine MarioGPT with novelty search, enabling it to generate diverse levels with varying play-style dynamics (i.e. player paths). This combination allows for the open-ended generation of an increasingly diverse range of content.  ( 2 min )
    EEG Decoding for Datasets with Heterogenous Electrode Configurations using Transfer Learning Graph Neural Networks. (arXiv:2306.13109v1 [eess.SP])
    Brain-Machine Interfacing (BMI) has greatly benefited from adopting machine learning methods for feature learning that require extensive data for training, which are often unavailable from a single dataset. Yet, it is difficult to combine data across labs or even data within the same lab collected over the years due to the variation in recording equipment and electrode layouts resulting in shifts in data distribution, changes in data dimensionality, and altered identity of data dimensions. Our objective is to overcome this limitation and learn from many different and diverse datasets across labs with different experimental protocols. To tackle the domain adaptation problem, we developed a novel machine learning framework combining graph neural networks (GNNs) and transfer learning methodologies for non-invasive Motor Imagery (MI) EEG decoding, as an example of BMI. Empirically, we focus on the challenges of learning from EEG data with different electrode layouts and varying numbers of electrodes. We utilise three MI EEG databases collected using very different numbers of EEG sensors (from 22 channels to 64) and layouts (from custom layouts to 10-20). Our model achieved the highest accuracy with lower standard deviations on the testing datasets. This indicates that the GNN-based transfer learning framework can effectively aggregate knowledge from multiple datasets with different electrode layouts, leading to improved generalization in subject-independent MI EEG classification. The findings of this study have important implications for Brain-Computer-Interface (BCI) research, as they highlight a promising method for overcoming the limitations posed by non-unified experimental setups. By enabling the integration of diverse datasets with varying electrode layouts, our proposed approach can help advance the development and application of BMI technologies.  ( 3 min )
    Key Frame Extraction with Attention Based Deep Neural Networks. (arXiv:2306.13176v1 [cs.CV])
    Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos. Providing a summary of the video is an important task to facilitate quick browsing and content summarization. The resulting photos are used for automated works (e.g. summarizing security footage, detecting different scenes used in music clips) in different industries. In addition, processing high-volume videos in advanced machine learning methods also creates resource costs. Keyframes obtained; It can be used as an input feature to the methods and models to be used. In this study; We propose a deep learning-based approach for keyframe detection using a deep auto-encoder model with an attention layer. The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means clustering algorithm to group these features and similar frames together. Then, keyframes are selected from each cluster by selecting the frames closest to the center of the clusters. The method was evaluated on the TVSUM video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods. The proposed method offers a promising solution for key frame extraction in video analysis and can be applied to various applications such as video summarization and video retrieval.  ( 2 min )
    Reinforcement Learning in Non-Markovian Environments. (arXiv:2211.01595v2 [eess.SY] UPDATED)
    Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.  ( 2 min )
    Differentially Private Synthetic Data Using KD-Trees. (arXiv:2306.13211v1 [cs.CR])
    Creation of a synthetic dataset that faithfully represents the data distribution and simultaneously preserves privacy is a major research challenge. Many space partitioning based approaches have emerged in recent years for answering statistical queries in a differentially private manner. However, for synthetic data generation problem, recent research has been mainly focused on deep generative models. In contrast, we exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $\epsilon$-differentially private synthetic data generation whose kernel density resembles that of the real dataset. Additionally, we provide theoretical results on the utility-privacy trade-offs and show how our data dependent approach overcomes the curse of dimensionality and leads to a scalable algorithm. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.  ( 2 min )
    DISCO-10M: A Large-Scale Music Dataset. (arXiv:2306.13512v1 [cs.SD])
    Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.  ( 2 min )
    Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data. (arXiv:2211.04924v2 [cs.LG] UPDATED)
    Predicting the presence of major depressive disorder (MDD) using behavioural and cognitive signals is a highly non-trivial task. The heterogeneous clinical profile of MDD means that any given speech, facial expression and/or observed cognitive pattern may be associated with a unique combination of depressive symptoms. Conventional discriminative machine learning models potentially lack the complexity to robustly model this heterogeneity. Bayesian networks, however, may instead be well-suited to such a scenario. These networks are probabilistic graphical models that efficiently describe the joint probability distribution over a set of random variables by explicitly capturing their conditional dependencies. This framework provides further advantages over standard discriminative modelling by offering the possibility to incorporate expert opinion in the graphical structure of the models, generating explainable model predictions, informing about the uncertainty of predictions, and naturally handling missing data. In this study, we apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.  ( 2 min )
    Multi-task Learning for Radar Signal Characterisation. (arXiv:2306.13105v1 [eess.SP])
    Radio signal recognition is a crucial task in both civilian and military applications, as accurate and timely identification of unknown signals is an essential part of spectrum management and electronic warfare. The majority of research in this field has focused on applying deep learning for modulation classification, leaving the task of signal characterisation as an understudied area. This paper addresses this gap by presenting an approach for tackling radar signal classification and characterisation as a multi-task learning (MTL) problem. We propose the IQ Signal Transformer (IQST) among several reference architectures that allow for simultaneous optimisation of multiple regression and classification tasks. We demonstrate the performance of our proposed MTL model on a synthetic radar dataset, while also providing a first-of-its-kind benchmark for radar signal characterisation.  ( 2 min )
    Two derivations of Principal Component Analysis on datasets of distributions. (arXiv:2306.13503v1 [stat.ML])
    In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives.  ( 2 min )
    Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models. (arXiv:2306.13255v1 [cs.LG])
    We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1 asymptotically. One surprising consequence of our tight results is that the min-norm interpolating classifier can be asymptotically suboptimal relative to noninterpolating classifiers in the regime where the min-norm interpolating regressor is known to be optimal. The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels. As an application, we show that the same type of analysis can be used to analyze the related multilabel classification problem under the same bi-level ensemble.  ( 2 min )
    Uniform Convergence with Square-Root Lipschitz Loss. (arXiv:2306.13188v1 [stat.ML])
    We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.  ( 2 min )
    MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals. (arXiv:2306.13102v1 [eess.SP])
    Brain signals are important quantitative data for understanding physiological activities and diseases of human brain. Most existing studies pay attention to supervised learning methods, which, however, require high-cost clinical labels. In addition, the huge difference in the clinical patterns of brain signals measured by invasive (e.g., SEEG) and non-invasive (e.g., EEG) methods leads to the lack of a unified method. To handle the above issues, we propose to study the self-supervised learning (SSL) framework for brain signals that can be applied to pre-train either SEEG or EEG data. Intuitively, brain signals, generated by the firing of neurons, are transmitted among different connecting structures in human brain. Inspired by this, we propose MBrain to learn implicit spatial and temporal correlations between different channels (i.e., contacts of the electrode, corresponding to different brain areas) as the cornerstone for uniformly modeling different types of brain signals. Specifically, we represent the spatial correlation by a graph structure, which is built with proposed multi-channel CPC. We theoretically prove that optimizing the goal of multi-channel CPC can lead to a better predictive representation and apply the instantaneou-time-shift prediction task based on it. Then we capture the temporal correlation by designing the delayed-time-shift prediction task. Finally, replace-discriminative-learning task is proposed to preserve the characteristics of each channel. Extensive experiments of seizure detection on both EEG and SEEG large-scale real-world datasets demonstrate that our model outperforms several state-of-the-art time series SSL and unsupervised models, and has the ability to be deployed to clinical practice.  ( 3 min )
    Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses. (arXiv:2306.13104v1 [q-bio.NC])
    Neuroprostheses show potential in restoring lost sensory function and enhancing human capabilities, but the sensations produced by current devices often seem unnatural or distorted. Exact placement of implants and differences in individual perception lead to significant variations in stimulus response, making personalized stimulus optimization a key challenge. Bayesian optimization could be used to optimize patient-specific stimulation parameters with limited noisy observations, but is not feasible for high-dimensional stimuli. Alternatively, deep learning models can optimize stimulus encoding strategies, but typically assume perfect knowledge of patient-specific variations. Here we propose a novel, practically feasible approach that overcomes both of these fundamental limitations. First, a deep encoder network is trained to produce optimal stimuli for any individual patient by inverting a forward model mapping electrical stimuli to visual percepts. Second, a preferential Bayesian optimization strategy utilizes this encoder to optimize patient-specific parameters for a new patient, using a minimal number of pairwise comparisons between candidate stimuli. We demonstrate the viability of this approach on a novel, state-of-the-art visual prosthesis model. We show that our approach quickly learns a personalized stimulus encoder, leads to dramatic improvements in the quality of restored vision, and is robust to noisy patient feedback and misspecifications in the underlying forward model. Overall, our results suggest that combining the strengths of deep learning and Bayesian optimization could significantly improve the perceptual experience of patients fitted with visual prostheses and may prove a viable solution for a range of neuroprosthetic technologies.  ( 3 min )
    Optimal Sensor Placement with Adaptive Constraints for Nuclear Digital Twins. (arXiv:2306.13637v1 [math.OC])
    Given harsh operating conditions and physical constraints in reactors, nuclear applications cannot afford to equip the physical asset with a large array of sensors. Therefore, it is crucial to carefully determine the placement of sensors within the given spatial limitations, enabling the reconstruction of reactor flow fields and the creation of nuclear digital twins. Various design considerations are imposed, such as predetermined sensor locations, restricted areas within the reactor, a fixed number of sensors allocated to a specific region, or sensors positioned at a designated distance from one another. We develop a data-driven technique that integrates constraints into an optimization procedure for sensor placement, aiming to minimize reconstruction errors. Our approach employs a greedy algorithm that can optimize sensor locations on a grid, adhering to user-defined constraints. We demonstrate the near optimality of our algorithm by computing all possible configurations for selecting a certain number of sensors for a randomly generated state space system. In this work, the algorithm is demonstrated on the Out-of-Pile Testing and Instrumentation Transient Water Irradiation System (OPTI-TWIST) prototype vessel, which is electrically heated to mimic the neutronics effect of the Transient Reactor Test facility (TREAT) at Idaho National Laboratory (INL). The resulting sensor-based reconstruction of temperature within the OPTI-TWIST minimizes error, provides probabilistic bounds for noise-induced uncertainty and will finally be used for communication between the digital twin and experimental facility.  ( 2 min )
    DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data. (arXiv:2205.00701v4 [astro-ph.IM] UPDATED)
    Gravitational lensing is the relativistic effect generated by massive bodies, which bend the space-time surrounding them. It is a deeply investigated topic in astrophysics and allows validating theoretical relativistic results and studying faint astrophysical objects that would not be visible otherwise. In recent years Machine Learning methods have been applied to support the analysis of the gravitational lensing phenomena by detecting lensing effects in data sets consisting of images associated with brightness variation time series. However, the state-of-art approaches either consider only images and neglect time-series data or achieve relatively low accuracy on the most difficult data sets. This paper introduces DeepGraviLens, a novel multi-modal network that classifies spatio-temporal data belonging to one non-lensed system type and three lensed system types. It surpasses the current state of the art accuracy results by $\approx 3\%$ to $\approx 11\%$, depending on the considered data set. Such an improvement will enable the acceleration of the analysis of lensed objects in upcoming astrophysical surveys, which will exploit the petabytes of data collected, e.g., from the Vera C. Rubin Observatory.  ( 3 min )
    Variational Counterfactual Prediction under Runtime Domain Corruption. (arXiv:2306.13271v1 [cs.LG])
    To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.  ( 2 min )
    Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions. (arXiv:2112.07369v2 [cs.LG] UPDATED)
    In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.  ( 3 min )
    BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning. (arXiv:2306.13101v1 [eess.SP])
    Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis.  ( 3 min )
    The Inductive Bias of Flatness Regularization for Deep Matrix Factorization. (arXiv:2306.13239v1 [cs.LG])
    Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets.  ( 2 min )
    On tracking varying bounds when forecasting bounded time series. (arXiv:2306.13428v1 [stat.ML])
    We consider a new framework where a continuous, though bounded, random variable has unobserved bounds that vary over time. In the context of univariate time series, we look at the bounds as parameters of the distribution of the bounded random variable. We introduce an extended log-likelihood estimation and design algorithms to track the bound through online maximum likelihood estimation. Since the resulting optimization problem is not convex, we make use of recent theoretical results on Normalized Gradient Descent (NGD) for quasiconvex optimization, to eventually derive an Online Normalized Gradient Descent algorithm. We illustrate and discuss the workings of our approach based on both simulation studies and a real-world wind power forecasting problem.  ( 2 min )
    Adaptive Planning Search Algorithm for Analog Circuit Verification. (arXiv:2306.13484v1 [cs.AI])
    Integrated circuit verification has gathered considerable interest in recent times. Since these circuits keep growing in complexity year by year, pre-Silicon (pre-SI) verification becomes ever more important, in order to ensure proper functionality. Thus, in order to reduce the time needed for manually verifying ICs, we propose a machine learning (ML) approach, which uses less simulations. This method relies on an initial evaluation set of operating condition configurations (OCCs), in order to train Gaussian process (GP) surrogate models. By using surrogate models, we can propose further, more difficult OCCs. Repeating this procedure for several iterations has shown better GP estimation of the circuit's responses, on both synthetic and real circuits, resulting in a better chance of finding the worst case, or even failures, for certain circuit responses. Thus, we show that the proposed approach is able to provide OCCs closer to the specifications for all circuits and identify a failure (specification violation) for one of the responses of a real circuit.  ( 2 min )
    Minibatch training of neural network ensembles via trajectory sampling. (arXiv:2306.13442v1 [cond-mat.stat-mech])
    Most iterative neural network training methods use estimates of the loss function over small random subsets (or minibatches) of the data to update the parameters, which aid in decoupling the training time from the (often very large) size of the training datasets. Here, we show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficent manner. We illustrate this approach by training NNEs to classify images in the MNIST datasets. This method gives an improvement to the training times, allowing it to scale as the ratio of the size of the dataset to that of the average minibatch size which, in the case of MNIST, gives a computational improvement typically of two orders of magnitude. We highlight the advantage of using longer trajectories to represent NNEs, both for improved accuracy in inference and reduced update cost in terms of the samples needed in minibatch updates.  ( 2 min )
    A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules. (arXiv:2306.13641v1 [stat.ML])
    The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.  ( 2 min )
    SpeedLimit: Neural Architecture Search for Quantized Transformer Models. (arXiv:2209.12127v2 [cs.LG] UPDATED)
    While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments.  ( 2 min )
    PU GNN: Chargeback Fraud Detection in P2E MMORPGs via Graph Attention Networks with Imbalanced PU Labels. (arXiv:2211.08604v7 [cs.LG] UPDATED)
    The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.  ( 3 min )
    Efficient Online Processing with Deep Neural Networks. (arXiv:2306.13474v1 [cs.LG])
    The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online inference. Here, the concept of Continual Inference Networks (CINs) is proposed and explored across four publications. CINs extend prior state-of-the-art methods developed for offline processing of spatio-temporal data and reuse their pre-trained weights, improving their online processing efficiency by an order of magnitude. These advances are attained through a bottom-up computational reorganization and judicious architectural modifications. The benefit to online inference is demonstrated by reformulating several widely used network architectures into CINs, including 3D CNNs, ST-GCNs, and Transformer Encoders. An orthogonal contribution tackles the concurrent adaptation and computational acceleration of a large source model into multiple lightweight derived models. Drawing on fusible adapter networks and structured pruning, Structured Pruning Adapters achieve superior predictive accuracy under aggressive pruning using significantly fewer learned weights compared to fine-tuning with pruning.  ( 2 min )
    DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability. (arXiv:2306.13196v1 [cs.RO])
    Task and Motion Planning (TAMP) approaches are effective at planning long-horizon autonomous robot manipulation. However, because they require a planning model, it can be difficult to apply them to domains where the environment and its dynamics are not fully known. We propose to overcome these limitations by leveraging deep generative modeling, specifically diffusion models, to learn constraints and samplers that capture these difficult-to-engineer aspects of the planning model. These learned samplers are composed and combined within a TAMP solver in order to find action parameter values jointly that satisfy the constraints along a plan. To tractably make predictions for unseen objects in the environment, we define these samplers on low-dimensional learned latent embeddings of changing object state. We evaluate our approach in an articulated object manipulation domain and show how the combination of classical TAMP, generative learning, and latent embeddings enables long-horizon constraint-based reasoning.  ( 2 min )
    An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression. (arXiv:2306.13185v1 [stat.ML])
    We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (qv Mallinar et al. 2022).  ( 2 min )
    Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok. (arXiv:2306.13253v1 [cs.LG])
    This paper focuses on predicting the occurrence of grokking in neural networks, a phenomenon in which perfect generalization emerges long after signs of overfitting or memorization are observed. It has been reported that grokking can only be observed with certain hyper-parameters. This makes it critical to identify the parameters that lead to grokking. However, since grokking occurs after a large number of epochs, searching for the hyper-parameters that lead to it is time-consuming. In this paper, we propose a low-cost method to predict grokking without training for a large number of epochs. In essence, by studying the learning curve of the first few epochs, we show that one can predict whether grokking will occur later on. Specifically, if certain oscillations occur in the early epochs, one can expect grokking to occur if the model is trained for a much longer period of time. We propose using the spectral signature of a learning curve derived by applying the Fourier transform to quantify the amplitude of low-frequency components to detect the presence of such oscillations. We also present additional experiments aimed at explaining the cause of these oscillations and characterizing the loss landscape.  ( 3 min )
    Higher-order Motif-based Time Series Classification for Forced Oscillation Source Location in Power Grids. (arXiv:2306.13397v1 [cs.LG])
    Time series motifs are used for discovering higher-order structures of time series data. Based on time series motifs, the motif embedding correlation field (MECF) is proposed to characterize higher-order temporal structures of dynamical system time series. A MECF-based unsupervised learning approach is applied in locating the source of the forced oscillation (FO), a periodic disturbance that detrimentally impacts power grids. Locating the FO source is imperative for system stability. Compared with the Fourier analysis, the MECF-based unsupervised learning is applicable under various FO situations, including the single FO, FO with resonance, and multiple sources FOs. The MECF-based unsupervised learning is a data-driven approach without any prior knowledge requirement of system models or typologies. Tests on the UK high-voltage transmission grid illustrate the effectiveness of MECF-based unsupervised learning. In addition, the impacts of coupling strength and measurement noise on locating the FO source by the MECF-based unsupervised learning are investigated.  ( 2 min )
    Diverse Community Data for Benchmarking Data Privacy Algorithms. (arXiv:2306.13216v1 [cs.CR])
    The Diverse Communities Data Excerpts are the core of a National Institute of Standards and Technology (NIST) program to strengthen understanding of tabular data deidentification technologies such as synthetic data. Synthetic data is an ambitious attempt to democratize the benefits of big data; it uses generative models to recreate sensitive personal data with new records for public release. However, it is vulnerable to the same bias and privacy issues that impact other machine learning applications, and can even amplify those issues. When deidentified data distributions introduce bias or artifacts, or leak sensitive information, they propagate these problems to downstream applications. Furthermore, real-world survey conditions such as diverse subpopulations, heterogeneous non-ordinal data spaces, and complex dependencies between features pose specific challenges for synthetic data algorithms. These observations motivate the need for real, diverse, and complex benchmark data to support a robust understanding of algorithm behavior. This paper introduces four contributions: new theoretical work on the relationship between diverse populations and challenges for equitable deidentification; public benchmark data focused on diverse populations and challenging features curated from the American Community Survey; an open source suite of evaluation metrology for deidentified datasets; and an archive of evaluation results on a broad collection of deidentification techniques. The initial set of evaluation results demonstrate the suitability of these tools for investigations in this field.  ( 2 min )
    DLKoopman: A deep learning software package for Koopman theory. (arXiv:2211.08992v2 [cs.LG] UPDATED)
    We present DLKoopman -- a software package for Koopman theory that uses deep learning to learn an encoding of a nonlinear dynamical system into a linear space, while simultaneously learning the linear dynamics. While several previous efforts have either restricted the ability to learn encodings, or been bespoke efforts designed for specific systems, DLKoopman is a generalized tool that can be applied to data-driven learning and optimization of any dynamical system. It can either be trained on data from individual states (snapshots) of a system and used to predict its unknown states, or trained on data from trajectories of a system and used to predict unknown trajectories for new initial states. DLKoopman is available on the Python Package Index (PyPI) as 'dlkoopman', and includes extensive documentation and tutorials. Additional contributions of the package include a novel metric called Average Normalized Absolute Error for evaluating performance, and a ready-to-use hyperparameter search module for improving performance.  ( 2 min )
    Variance-Covariance Regularization Improves Representation Learning. (arXiv:2306.13292v1 [cs.LG])
    Transfer learning has emerged as a key approach in the machine learning domain, enabling the application of knowledge derived from one domain to improve performance on subsequent tasks. Given the often limited information about these subsequent tasks, a strong transfer learning approach calls for the model to capture a diverse range of features during the initial pretraining stage. However, recent research suggests that, without sufficient regularization, the network tends to concentrate on features that primarily reduce the pretraining loss function. This tendency can result in inadequate feature learning and impaired generalization capability for target tasks. To address this issue, we propose Variance-Covariance Regularization (VCR), a regularization technique aimed at fostering diversity in the learned network features. Drawing inspiration from recent advancements in the self-supervised learning approach, our approach promotes learned representations that exhibit high variance and minimal covariance, thus preventing the network from focusing solely on loss-reducing features. We empirically validate the efficacy of our method through comprehensive experiments coupled with in-depth analytical studies on the learned representations. In addition, we develop an efficient implementation strategy that assures minimal computational overhead associated with our method. Our results indicate that VCR is a powerful and efficient method for enhancing transfer learning performance for both supervised learning and self-supervised learning, opening new possibilities for future research in this domain.  ( 2 min )
    Weak Signal Asymptotics for Sequentially Randomized Experiments. (arXiv:2101.09855v7 [math.ST] UPDATED)
    We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between actions scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments -- adapted to this scaling regime and with arm selection probabilities that vary continuously with state -- converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics, and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not UCB, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from sub-optimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance-specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.  ( 3 min )
    AmicroN: A Framework for Generating Annotations for Human Activity Recognition with Granular Micro-Activities. (arXiv:2306.13149v1 [cs.HC])
    Efficient human activity recognition (HAR) using sensor data needs a significant volume of annotated data. The growing volume of unlabelled sensor data has challenged conventional practices for gathering HAR annotations with human-in-the-loop approaches, often leading to the collection of shallower annotations. These shallower annotations ignore the fine-grained micro-activities that constitute any complex activities of daily living (ADL). Understanding this, we, in this paper, first analyze this lack of granular annotations from available pre-annotated datasets to understand the practical inconsistencies and also perform a detailed survey to look into the human perception surrounding annotations. Drawing motivations from these, we next develop the framework AmicroN that can automatically generate micro-activity annotations using locomotive signatures and the available coarse-grain macro-activity labels. In the backend, AmicroN applies change-point detection followed by zero-shot learning with activity embeddings to identify the unseen micro-activities in an unsupervised manner. Rigorous evaluation on publicly available datasets shows that AmicroN can accurately generate micro-activity annotations with a median F1-score of >0.75. Additionally, we also show that AmicroN can be used in a plug-and-play manner with Large Language Models (LLMs) to obtain the micro-activity labels, thus making it more practical for realistic applications.  ( 2 min )
    Visual Adversarial Examples Jailbreak Large Language Models. (arXiv:2306.13213v1 [cs.CR])
    Recently, there has been a surge of interest in introducing vision into Large Language Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as Flamingo, BLIP-2, and GPT-4, signifies an exciting convergence of advancements in both visual and language foundation models. Yet, the risks associated with this integrative approach are largely unexamined. In this paper, we shed light on the security and safety implications of this trend. First, we underscore that the continuous and high-dimensional nature of the additional visual input space intrinsically makes it a fertile ground for adversarial attacks. This unavoidably expands the attack surfaces of LLMs. Second, we highlight that the broad functionality of LLMs also presents visual attackers with a wider array of achievable adversarial objectives, extending the implications of security failures beyond mere misclassification. To elucidate these risks, we study adversarial examples in the visual input space of a VLM. Specifically, against MiniGPT-4, which incorporates safety mechanisms that can refuse harmful instructions, we present visual adversarial examples that can circumvent the safety mechanisms and provoke harmful behaviors of the model. Remarkably, we discover that adversarial examples, even if optimized on a narrow, manually curated derogatory corpus against specific social groups, can universally jailbreak the model's safety mechanisms. A single such adversarial example can generally undermine MiniGPT-4's safety, enabling it to heed a wide range of harmful instructions and produce harmful content far beyond simply imitating the derogatory corpus used in optimization. Unveiling these risks, we accentuate the urgent need for comprehensive risk assessments, robust defense strategies, and the implementation of responsible practices for the secure and safe utilization of VLMs.  ( 3 min )
    Improving Log-Cumulant Based Estimation of Roughness Information in SAR imagery. (arXiv:2306.13200v1 [cs.CV])
    Synthetic Aperture Radar (SAR) image understanding is crucial in remote sensing applications, but it is hindered by its intrinsic noise contamination, called speckle. Sophisticated statistical models, such as the $\mathcal{G}^0$ family of distributions, have been employed to SAR data and many of the current advancements in processing this imagery have been accomplished through extracting information from these models. In this paper, we propose improvements to parameter estimation in $\mathcal{G}^0$ distributions using the Method of Log-Cumulants. First, using Bayesian modeling, we construct that regularly produce reliable roughness estimates under both $\mathcal{G}^0_A$ and $\mathcal{G}^0_I$ models. Second, we make use of an approximation of the Trigamma function to compute the estimated roughness in constant time, making it considerably faster than the existing method for this task. Finally, we show how we can use this method to achieve fast and reliable SAR image understanding based on roughness information.  ( 2 min )
    Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity. (arXiv:2306.13263v1 [cs.LG])
    In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.  ( 2 min )
    Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows. (arXiv:2206.06672v2 [cs.LG] UPDATED)
    Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.  ( 2 min )
    Can Continual Learning Improve Long-Tailed Recognition? Toward a Unified Framework. (arXiv:2306.13275v1 [cs.LG])
    The Long-Tailed Recognition (LTR) problem emerges in the context of learning from highly imbalanced datasets, in which the number of samples among different classes is heavily skewed. LTR methods aim to accurately learn a dataset comprising both a larger Head set and a smaller Tail set. We propose a theorem where under the assumption of strong convexity of the loss function, the weights of a learner trained on the full dataset are within an upper bound of the weights of the same learner trained strictly on the Head. Next, we assert that by treating the learning of the Head and Tail as two separate and sequential steps, Continual Learning (CL) methods can effectively update the weights of the learner to learn the Tail without forgetting the Head. First, we validate our theoretical findings with various experiments on the toy MNIST-LT dataset. We then evaluate the efficacy of several CL strategies on multiple imbalanced variations of two standard LTR benchmarks (CIFAR100-LT and CIFAR10-LT), and show that standard CL methods achieve strong performance gains in comparison to baselines and approach solutions that have been tailor-made for LTR. We also assess the applicability of CL techniques on real-world data by exploring CL on the naturally imbalanced Caltech256 dataset and demonstrate its superiority over state-of-the-art classifiers. Our work not only unifies LTR and CL but also paves the way for leveraging advances in CL methods to tackle the LTR challenge more effectively.  ( 3 min )
  • Open

    Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models. (arXiv:2306.13255v1 [cs.LG])
    We study the asymptotic generalization of an overparameterized linear model for multiclass classification under the Gaussian covariates bi-level model introduced in Subramanian et al.~'22, where the number of data points, features, and classes all grow together. We fully resolve the conjecture posed in Subramanian et al.~'22, matching the predicted regimes for generalization. Furthermore, our new lower bounds are akin to an information-theoretic strong converse: they establish that the misclassification rate goes to 0 or 1 asymptotically. One surprising consequence of our tight results is that the min-norm interpolating classifier can be asymptotically suboptimal relative to noninterpolating classifiers in the regime where the min-norm interpolating regressor is known to be optimal. The key to our tight analysis is a new variant of the Hanson-Wright inequality which is broadly useful for multiclass problems with sparse labels. As an application, we show that the same type of analysis can be used to analyze the related multilabel classification problem under the same bi-level ensemble.  ( 2 min )
    Uniform Convergence with Square-Root Lipschitz Loss. (arXiv:2306.13188v1 [stat.ML])
    We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand "optimistic rate" and interpolation learning guarantees.
    Two derivations of Principal Component Analysis on datasets of distributions. (arXiv:2306.13503v1 [stat.ML])
    In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these perspectives.
    Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality. (arXiv:2301.03785v2 [stat.ML] UPDATED)
    This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the $\beta$-optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially estimating the optimal allocations up to sufficient fidelity. Specifically, these estimates are accurate enough for identifying the best arm (hence, achieving optimality) but not overly accurate to an unnecessary extent that creates excessive computational complexity (hence, maintaining efficiency). Furthermore, the existing relevant literature focuses on the family of exponential distributions. This paper considers a more general setting of any arbitrary family of distributions parameterized by their mean values (under mild regularity conditions). The optimality is established analytically, and numerical evaluations are provided to assess the analytical guarantees and compare the performance with those of the existing ones.
    Approximate Causal Effect Identification under Weak Confounding. (arXiv:2306.13242v1 [stat.ML])
    Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.
    Differentially Private Synthetic Data Using KD-Trees. (arXiv:2306.13211v1 [cs.CR])
    Creation of a synthetic dataset that faithfully represents the data distribution and simultaneously preserves privacy is a major research challenge. Many space partitioning based approaches have emerged in recent years for answering statistical queries in a differentially private manner. However, for synthetic data generation problem, recent research has been mainly focused on deep generative models. In contrast, we exploit space partitioning techniques together with noise perturbation and thus achieve intuitive and transparent algorithms. We propose both data independent and data dependent algorithms for $\epsilon$-differentially private synthetic data generation whose kernel density resembles that of the real dataset. Additionally, we provide theoretical results on the utility-privacy trade-offs and show how our data dependent approach overcomes the curse of dimensionality and leads to a scalable algorithm. We show empirical utility improvements over the prior work, and discuss performance of our algorithm on a downstream classification task on a real dataset.
    Compression with Bayesian Implicit Neural Representations. (arXiv:2305.19185v2 [cs.LG] UPDATED)
    Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural network to its functional representation and then encoding the network weights. However, most current solutions for this are inefficient, as quantization to low-bit precision substantially degrades the reconstruction quality. To address this issue, we propose overfitting variational Bayesian neural networks to the data and compressing an approximate posterior weight sample using relative entropy coding instead of quantizing and entropy coding it. This strategy enables direct optimization of the rate-distortion performance by minimizing the $\beta$-ELBO, and target different rate-distortion trade-offs for a given network architecture by adjusting $\beta$. Moreover, we introduce an iterative algorithm for learning prior weight distributions and employ a progressive refinement process for the variational posterior that significantly enhances performance. Experiments show that our method achieves strong performance on image and audio compression while retaining simplicity.  ( 2 min )
    Curvature Filtrations for Graph Generative Model Evaluation. (arXiv:2301.12906v2 [cs.LG] UPDATED)
    Graph generative model evaluation necessitates understanding differences between graphs on the distributional level. This entails being able to harness salient attributes of graphs in an efficient manner. Curvature constitutes one such property of graphs, and has recently started to prove useful in characterising graphs. Its expressive properties, stability, and practical utility in model evaluation remain largely unexplored, however. We combine graph curvature descriptors with emerging methods from topological data analysis to obtain robust, expressive descriptors for evaluating graph generative models.  ( 2 min )
    A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules. (arXiv:2306.13641v1 [stat.ML])
    The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.  ( 2 min )
    On the Convergence Rate of Gaussianization with Random Rotations. (arXiv:2306.13520v1 [cs.LG])
    Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.  ( 2 min )
    Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows. (arXiv:2206.06672v2 [cs.LG] UPDATED)
    Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.  ( 2 min )
    COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks. (arXiv:2305.06754v2 [cs.CL] CROSS LISTED)
    Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the underlying model or requiring a new one to be trained. We conduct experiments in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's superior ability to discover concepts that align with humans' on Transformer models without any supervision, we objectively verify the faithfulness of its explanations through fidelity metrics, and we showcase its ability to provide meaningful explanations in two different datasets.  ( 3 min )
    Efficient Model Selection for Predictive Pattern Mining Model by Safe Pattern Pruning. (arXiv:2306.13561v1 [stat.ML])
    Predictive pattern mining is an approach used to construct prediction models when the input is represented by structured data, such as sets, graphs, and sequences. The main idea behind predictive pattern mining is to build a prediction model by considering substructures, such as subsets, subgraphs, and subsequences (referred to as patterns), present in the structured data as features of the model. The primary challenge in predictive pattern mining lies in the exponential growth of the number of patterns with the complexity of the structured data. In this study, we propose the Safe Pattern Pruning (SPP) method to address the explosion of pattern numbers in predictive pattern mining. We also discuss how it can be effectively employed throughout the entire model building process in practical data analysis. To demonstrate the effectiveness of the proposed method, we conduct numerical experiments on regression and classification problems involving sets, graphs, and sequences.  ( 2 min )
    Prediction under Latent Subgroup Shifts with High-Dimensional Observations. (arXiv:2306.13472v1 [stat.ML])
    We introduce a new approach to prediction in graphical models with latent-shift adaptation, i.e., where source and target environments differ in the distribution of an unobserved confounding latent variable. Previous work has shown that as long as "concept" and "proxy" variables with appropriate dependence are observed in the source environment, the latent-associated distributional changes can be identified, and target predictions adapted accurately. However, practical estimation methods do not scale well when the observations are complex and high-dimensional, even if the confounding latent is categorical. Here we build upon a recently proposed probabilistic unsupervised learning framework, the recognition-parametrised model (RPM), to recover low-dimensional, discrete latents from image observations. Applied to the problem of latent shifts, our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target. We demonstrate results in settings where predictor and proxy are high-dimensional images, a context to which previous methods fail to scale.  ( 2 min )
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v3 [stat.ML] UPDATED)
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The existing algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, the BAI problem is viewed and analyzed as a sequential composite hypothesis testing task, and a framework is proposed that adopts the likelihood ratio-based tests known to be effective for sequential testing. Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits. This algorithm has two key features: (1) its sample complexity is asymptotically optimal, and (2) it is guaranteed to be $\delta-$PAC. Existing efficient approaches focus on the Gaussian setting and require Thompson sampling for the arm deemed the best and the challenger arm. Additionally, this paper analytically quantifies the computational expense of identifying the challenger in an existing approach. Finally, numerical experiments are provided to support the analysis.  ( 2 min )
    Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data. (arXiv:2203.15797v2 [math.OC] UPDATED)
    We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.  ( 2 min )
    Understanding quantum machine learning also requires rethinking generalization. (arXiv:2306.13461v1 [quant-ph])
    Quantum machine learning models have shown successful generalization performance even when trained with few data. In this work, through systematic randomization experiments, we show that traditional approaches to understanding generalization fail to explain the behavior of such quantum models. Our experiments reveal that state-of-the-art quantum neural networks accurately fit random states and random labeling of training data. This ability to memorize random data defies current notions of small generalization error, problematizing approaches that build on complexity measures such as the VC dimension, the Rademacher complexity, and all their uniform relatives. We complement our empirical results with a theoretical construction showing that quantum neural networks can fit arbitrary labels to quantum states, hinting at their memorization ability. Our results do not preclude the possibility of good generalization with few training data but rather rule out any possible guarantees based only on the properties of the model family. These findings expose a fundamental challenge in the conventional understanding of generalization in quantum machine learning and highlight the need for a paradigm shift in the design of quantum models for machine learning tasks.  ( 2 min )
    Variational Counterfactual Prediction under Runtime Domain Corruption. (arXiv:2306.13271v1 [cs.LG])
    To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.  ( 2 min )
    On tracking varying bounds when forecasting bounded time series. (arXiv:2306.13428v1 [stat.ML])
    We consider a new framework where a continuous, though bounded, random variable has unobserved bounds that vary over time. In the context of univariate time series, we look at the bounds as parameters of the distribution of the bounded random variable. We introduce an extended log-likelihood estimation and design algorithms to track the bound through online maximum likelihood estimation. Since the resulting optimization problem is not convex, we make use of recent theoretical results on Normalized Gradient Descent (NGD) for quasiconvex optimization, to eventually derive an Online Normalized Gradient Descent algorithm. We illustrate and discuss the workings of our approach based on both simulation studies and a real-world wind power forecasting problem.  ( 2 min )
    STEEL: Singularity-aware Reinforcement Learning. (arXiv:2301.13152v4 [stat.ML] UPDATED)
    Batch reinforcement learning (RL) aims at leveraging pre-collected data to find an optimal policy that maximizes the expected total rewards in a dynamic environment. Nearly all existing algorithms rely on the absolutely continuous assumption on the distribution induced by target policies with respect to the data distribution, so that the batch data can be used to calibrate target policies via the change of measure. However, the absolute continuity assumption could be violated in practice (e.g., no-overlap support), especially when the state-action space is large or continuous. In this paper, we propose a new batch RL algorithm without requiring absolute continuity in the setting of an infinite-horizon Markov decision process with continuous states and actions. We call our algorithm STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by a new error analysis on off-policy evaluation, where we use maximum mean discrepancy, together with distributionally robust optimization, to characterize the error of off-policy evaluation caused by the possible singularity and to enable model extrapolation. By leveraging the idea of pessimism and under some mild conditions, we derive a finite-sample regret guarantee for our proposed algorithm without imposing absolute continuity. Compared with existing algorithms, by requiring only minimal data-coverage assumption, STEEL significantly improves the applicability and robustness of batch RL. Extensive simulation studies and one real experiment on personalized pricing demonstrate the superior performance of our method in dealing with possible singularity in batch RL.  ( 3 min )
    Lower Complexity Adaptation for Empirical Entropic Optimal Transport. (arXiv:2306.13580v1 [math.ST])
    Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter $\epsilon$ and the sample size $n$ only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate $n^{-1/2}$ with factor $\epsilon^{-d/2}$, where $d$ is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.  ( 2 min )
    An Agnostic View on the Cost of Overfitting in (Kernel) Ridge Regression. (arXiv:2306.13185v1 [stat.ML])
    We study the cost of overfitting in noisy kernel ridge regression (KRR), which we define as the ratio between the test error of the interpolating ridgeless model and the test error of the optimally-tuned model. We take an "agnostic" view in the following sense: we consider the cost as a function of sample size for any target function, even if the sample size is not large enough for consistency or the target is outside the RKHS. We analyze the cost of overfitting under a Gaussian universality ansatz using recently derived (non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis provides a more refined characterization of benign, tempered and catastrophic overfitting (qv Mallinar et al. 2022).  ( 2 min )
    Adversarial Resilience in Sequential Prediction via Abstention. (arXiv:2306.13119v1 [cs.LG])
    We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.

  • Open

    With great power comes great ... scams, just scams
    submitted by /u/enspiralart [link] [comments]  ( 8 min )
    Keep getting an RVC error
    trying to use RVC on linux, and I keep getting a python trainset_preprocess_pipline_print.py error anytime I try and process anything, regardless of the path, spelling, characters in the path, anything and every things spits out that exact error.... I put up a github issue on the RVC page, but when when I logged off I didn't see it, so github I guess shadowbans VPN users or whatever, as I created the account for that issue. oh well. submitted by /u/sovereign-celestial [link] [comments]  ( 8 min )
    Real-time male-female vocal RVC model. With Italian dataset.
    I was experimenting with RVC and created a model for male to female voice conversion. the result is acceptable even if the voice still sounds a bit robotic. and the conversion has a delay of just under a second. I've noticed that RVC can render accents much better than software like voice.ai. I post a demo of my model and take the opportunity to ask you if you know of any other AI-based voice changers that works well with languages ​​other than English. ​ https://reddit.com/link/14ix7bl/video/ihoxh8p2a88b1/player submitted by /u/qubit__ [link] [comments]  ( 8 min )
    AI is a BEAST! Use it, or get left BEHIND!
    submitted by /u/LinedVaccination [link] [comments]  ( 8 min )
    Does anyone know of any good forums pertaining to AI discussion?
    I've only came across a couple, I even had ChatGPT attempt to find me some obscure ones. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    Everything You Need to Know about AI
    CNN writer talks about the danger of AI, quoting "Top AI executives have warned that AI could potentially bring about human extinction. But these same executives are also racing to deploy the technology into their products." What do you Think? Is AI dangerous? Or can AI become dangerous in the future? I personally think AI can have a huge impact on the workforce; everything is getting smarter and with improved capabilities than ever before. This shift to AI will fuel the next generation of technology, and perhaps the next generation of technology will fuel new dangers... Waiting for you guys' thoughts below! submitted by /u/Hello_I_am_newell [link] [comments]  ( 8 min )
    AI on Trump's lies.
    Prompt: Has Donald J Trump ever lied? Bing AI: "Donald Trump made tens of thousands of false or misleading claims". ChatGPT: "Donald J. Trump has been known to make false or misleading statements throughout his political career". Google Bard: "Whether or not Donald Trump has ever lied is a matter of opinion." submitted by /u/decorama [link] [comments]  ( 8 min )
    AI Gold Explosions Series [workflow in comments]
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Does anybody know an AI tool where you can upload a video and it will tell you what the video is all about? Or at least the summary of it?
    I have a lot of online courses to watch (approximately 50 hours of content) but due to time constraints, I am forced to take a shortcut. I want to know if there is an available AI tool where you can upload your video (mp4, mkv, etc...) at least 10 minutes of duration, that can summarize it for you? Or give you the most important details as possible. If you know any, appreciate a lot for your help. Thank you! submitted by /u/virtuabart [link] [comments]  ( 8 min )
    I made a new Michael Jackson album - with AI.
    Using RVC, I made some AI covers, original songs and reconstructions with the voice of Michael Jackson. Check it out! submitted by /u/Some_Cryptographer_9 [link] [comments]  ( 8 min )
    The bigger-is-better approach to AI is running out of road
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/24/2023
    Sony Music has created a new position — Executive VP Of AI, hiring former BPI CEO Geoff Taylor, according to Billboard, who obtained an internal memo from the company. The new role will involve Sony “coordinating its business efforts surrounding AI” and “coordinating across the global digital business and business and legal affairs division.”[1] Prime Minister Narendra Modi of India held meetings with prominent U.S. and Indian technology executives, such as Tim Cook from Apple, Sundar Pichai from Google, and Satya Nadella from Microsoft. While the details of their conversation were kept private, the statement highlighted the discussion of technology’s power, specifically AI.[2] AI startup Stability AI just debuted its latest version of Stable Diffusion—and the model does not disappoint. Stable Diffusion XL (SDXL) v0.9 delivers ultra-photorealistic imagery, surpassing previous iterations in terms of sophistication and visual quality. This means, among other things, that Stability AI’s new model will not generate those troublesome “spaghetti hands” so often.[3] With no official photos of the Titan wreckage released, unscrupulous Twitter and Facebook accounts have stepped in with fake photos they created using AI image generators.[4] Sources: [1] https://www.stereogum.com/2228177/sony-music-creates-executive-role-dedicated-to-ai/news/ [2] https://www.businesstoday.in/technology/news/story/pm-modi-meets-microsoft-ceo-satya-nadella-discusses-ai-and-technology-cooperation-with-india-386932-2023-06-24 [3] https://decrypt.co/145842/stable-diffusion-upgrade-ai-images-midjourney [4] https://www.pcmag.com/news/ai-generated-images-of-titan-submersible-debris-hit-twitter-facebook submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI Tools to making short clips automatically.
    Hi, First of all, If this sub it's not proper place to ask questions like this, warn me in comments please. ​ I am searching an AI Tools that can make short content for platforms such as Tiktok, Instagram Reels, Youtube Shorts ect. I'm looking for something like AI makes the content out of full lengt videos, analyze the edits, rate the edits and decide which edit is most interesting to watch, and serve me. ​ Basically all have is left is sharing the post in platforms :) Is there a way to do it? submitted by /u/besterk [link] [comments]  ( 8 min )
    Taking Game AI Beyond Pathfinding and Combat: Building NPCs That Remember and Evolve
    Hi PC gamers, ever wondered what happens in a game world when you're not around? In my current project, the answer is - a lot! I'm working on creating a game world where NPCs don't just exist for player interaction but live their own lives. Every NPC remembers your interactions and has conversations with other NPCs based on those experiences. Even when you're logged off, the world evolves. An NPC could rise from being a town crier to mayor based on their actions and the dynamics they create with other NPCs. Imagine booting up your game after a week, returning to a town to find things have changed in your absence. The guard you'd befriended has become the captain, the merchant you helped is now prosperous, and the rumors about that dragon you slayed have spread across the kingdom, making you a hero even in places you've never visited. It's about creating a game world that’s not static, but dynamic and ever-changing. Want to follow along on this journey, discuss, or contribute ideas? Join us on Discord https://discord.com/invite/4mnzkYSJDg - let's shape this new era of gaming together! submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Dee Jesus and friends On the wheels of steel 🎛🎚🎧
    submitted by /u/StoicPrinciples [link] [comments]  ( 8 min )
  • Open

    [R] Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs
    https://ojs.aaai.org/index.php/AAAI/article/view/20684/20443 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [Discussion] LLM classification with large number of classes.
    Hello and welcome back ! I'm working on a classification with a LLM. That did work with the first version when I had "only" 5k classes, but now I have a bit more than the double, and it is pretty hard to fit (as, of course, the number of samples was also multiplied by ~5). Is there any easy to implement way to do that ? I've seen "Hierarchical Softmax", but 1/ didn't find a pytorch implementation to use with transformers, and also I wondered if there was not a more recent (and better) to deal with this kinf of problem ? Thank you in advance ! submitted by /u/ez613 [link] [comments]  ( 8 min )
    [N] How OpenAI's unique equity compensation works
    In 2019, OpenAI changed from a nonprofit to a ‘capped profit’ model in an effort to raise capital while still serving their mission. Per their blog post, “The fundamental idea of OpenAI LP is that investors and employees can get a capped return if we succeed at our mission, which allows us to raise investment capital and attract employees with startup-like equity. But any returns beyond that amount—and if we are successful, we expect to generate orders of magnitude more value than we’d owe to people who invest in or work at OpenAI LP—are owned by the original OpenAI Nonprofit entity.” How does an OpenAI offer work? There are two components to OpenAI’s offer: the base salary and the ‘equity,’ which they call ‘Profit Participation Units’ or ‘PPUs.’ The base salary is standard and self-explanatory. It’s the cash paycheck you get every two weeks. The profit participation units are where it starts to get confusing, as companies sometimes use profit participation or profit interest grants differently. Here’s a recent OpenAI offer (see all OpenAI offers here), where you can see a base salary of $300k/year with a $2M PPU grant that vests evenly over 4 years, bringing the yearly total compensation to $800k. https://www.levels.fyi/blog/openai-compensation.html submitted by /u/That_Violinist_18 [link] [comments]  ( 8 min )
    Additional Resources
    Hi everyone, After an extended blackout, we've decided to reopen the sub since it became pretty clear that if we didn't then the admins would likely replace us and just reopen anyway. We know lots of you contacted us during the blackout trying to understand how to stay up to date with the latest ML research, news, and discussions. For that reason we are providing additional resources below that either exclusively focus on ml or often discuss ml: taggernews - an ml powered classifier for hackernews posts tagged as ai/ml lobste.rs/t/ai - posts tagged as ai on lobste.rs m/machinelearning - a nascent space for ml discussion on kbin You can also find a more thorough list of subs with additional resources here: https://sub.rehab If you have additional resources you think would be useful, please comment below and we can add them to the list. submitted by /u/dojoteef [link] [comments]  ( 8 min )
  • Open

    Is there a way to save sb3 model as torchscript?
    I trained a PPO model using sb3, but I am required to have the model as torchscript .pt file due to some limitations. Is there a way to convert my model into a torchscript model? submitted by /u/naughty_ningen [link] [comments]  ( 8 min )
    "Relating Neural Text Degeneration to Exposure Bias", Chiang & Chen 2021
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    LLM Limitations and Hallucinations
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    A C++ version of micrograd, the simplest autograd engine for neural nets.
    I implemented cpp-micrograd a C++ version of Karpathy's autograd engine.https://github.com/10-zin/cpp-micrograd if you find it interesting, please support the project with stars and contributions. Given the recent breakthrough of C/C++ versions of neural nets, like gerganov's llama.cpp, it made a lot of sense to build some neural nets with C/C++, hence cpp-micrograd. ​ Albeit a toy version, it gives a good understanding of how c++ would implement basic neural nets . IMO a very good start to understanding and using c/c++ neural nets, like ggml as no matter how complex the network, the basic autograd computation graph will always be the core. ​ The vision is to make a C implementation too, and then go all the way to launching Cuda kernels, while making it as educational as possible. ​ Here is what value you can get from the repo currently. If you love micrograd, but would wanna also have a cpp version for it. If you want a crisp backprop theory and annotated code of the autograd engine. If you wanna learn cpp by building neural nets, then this could be a good start (it was my purpose). Thanks for reading! submitted by /u/10zin_ [link] [comments]  ( 8 min )
    Open source NN
    Hey guys whwre can I get free or low cost trained neural networks? Say for IDing fish? submitted by /u/benjaminsmith1981 [link] [comments]  ( 8 min )
    The bigger-is-better approach to AI is running out of road
    submitted by /u/nickb [link] [comments]  ( 8 min )

  • Open

    Hello
    It's nice to see you here on reddit! :) submitted by /u/endrid [link] [comments]  ( 8 min )
    AI Blood in the City (workflow in the comments)
    submitted by /u/PeePeePeePooPooPooo [link] [comments]  ( 8 min )
    Byte-Sized Battles: Merging Philosophical Legends and Cutting-Edge AI Technology
    Greetings, fellow minds of r/artificial! Get ready to have your intellectual gears cranked up to maximum capacity. I bring to you an experimental podcast that combines the profound wisdom of past philosophers with the power of cutting-edge technology. Introducing the mind-bending podcast: Byte-Sized Battles Have you ever pondered what it'd be like to listen to Socrates and Descartes lock horns in a battle of ideas? Or perhaps you've envisioned Nietzsche and Simone de Beauvoir engaging in a gripping debate on the morality of lying? Prepare to have your curiosity satisfied! ‘Byte-Sized Battles’ resurrects these philosophical legends, allowing them to engage in captivating discussions through the magic of artificially generated voices. Now, before you raise an eyebrow and question the legitimacy of this groundbreaking concept, let's take a moment to delve into the fascinating questions it raises: Is it acceptable to create a podcast about philosophy using artificial intelligence? And more broadly, how should we navigate the vast sea of knowledge in this ever-evolving world? The beauty of ‘Byte-Sized Battles’ lies in its ability to showcase both the advantages and disadvantages of AI technology. Through engaging conversations, we hope to initiate a profound exploration of its potential role in the future of podcasting. Let's embark on this journey together! The texts are carefully crafted using ChatGPT, while the voices themselves are brought to life by Genny. We invite you to listen in and join the discussion. Share your thoughts, opinions, and musings—it's all part of the experience! 😊 submitted by /u/Lukas_Bjerg [link] [comments]  ( 9 min )
    I wonder what the NSA does with those petabytes of data every day.
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    AI to Solve Phone Scams
    I bet a portion of you reading this already know exactly what it contains based on the title. Let's make it happen! If you are not aware, there is an epidemic of phone scams occurring for the past several years. There are entire call centers, mostly located in India, dedicated to scamming American & European senior citizens out of hundreds to hundreds-of-thousands (choose any currency, though mostly in USD). The details on how the scams work vary, but one thing generally remains the same throughout: it is the victim that initiates a phone call to the scam call center to request "tech support", generally in order to remove malware from the victim's machine (malware creates a Windows pop-up that tells the victim that they have been infected by a virus, and provides the victim a phone numb…  ( 9 min )
    AI Faceswap Music Video
    submitted by /u/MoistGranniesASA [link] [comments]  ( 8 min )
    Can I learn AI from home to the point of being hired for it?
    I was laid off but have about 18 months of expense saved. I want to use this time to retrain in AI. I’d like to do it in 3-5 months though, since most jobs take a long time from hire to start. Please let me know what your recommendations would be? submitted by /u/Tasty-Window [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/23/2023
    Prime Minister Narendra Modi received a special t-shirt as a gift from Joe Biden on Friday which had his quote on AI printed on it - 'The future is AI - America & India'. PM Modi, during his address to the joint sitting of the US Congress, gave a new definition for AI - America and India.[1] A new generative AI tool(Opens in a new window) is helping designers in the Toyota Research Institute (TRI) get a head start on creating new vehicles.[2] Wimbledon is introducing AI-powered commentary to its coverage this year. The All England Club has teamed up with tech group IBM to offer AI-generated audio commentary and captions in its online highlights videos.[3] Over 1,200 computer hackers from around the world packed UC Berkeley’s Martin Luther King Jr. Student Union last weekend during a 36-hour AI learning language model (LLM) hackathon that Berkeley leaders say was the largest event of its kind.[4] Sources: [1] https://www.ndtv.com/india-news/joe-biden-gifts-special-t-shirt-to-pm-narendra-modi-with-quote-on-ai-america-india-4148271/amp/1 [2] https://www.pcmag.com/news/toyota-is-using-generative-ai-to-design-new-evs [3] https://amp.theguardian.com/sport/2023/jun/21/wimbledon-introduce-ai-powered-commentary-to-coverage-this-year [4] https://news.berkeley.edu/2023/06/22/uc-berkeley-cultivates-festive-culture-of-free-thinkers-at-ai-hackathon/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Does anyone know of an affordable or free text to speech AI that can be used commercially?
    Title says it all. Thank u in advance submitted by /u/yea_okay_dude [link] [comments]  ( 8 min )
    Voice changer that generates unique voices?
    I have been searching for a free voice2voice converter, and the most promising ones are: voice.ai Retrieval Based Voice Conversion so-vits-svc Tortoise TTS What about creating original voices, though? Not sure if it is possible, but I thought of cloning two or more existing voices, then merging them together with audio editing software. Probably easier said than done. submitted by /u/Sculptor_THS [link] [comments]  ( 8 min )
  • Open

    Deep Reinforcement Learning Policies Learn Shared Adversarial Features across MDPs
    submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    BEST Python Libraries when getting started in Machine Learning!
    submitted by /u/keghn [link] [comments]  ( 8 min )
    How do people handle mapping variable length arrays to inputs?
    I've been reading about the impressive AI they created for the digital version of dominion and they talk about mapping the card abilities to an embedded layer, which I think I understand but the part I'm struggling to get my mind around is how one would actually map a card game where the draw, discard and hand sizes change constantly? I've seen there's zero padding or recurrent neural networks but the diagrams (https://www.templegatesgames.com/dominion-ai/) don't look like RNNs and the article makes no mention of it and zero padding also seems a bit wild given the large upper limits? Every example for a neural network the scenario is something like image recognition or snake or something else that maps well to the fixed length inputs of NNs. I thought one possibility might be giving a neural network the average number of times a particular card ability appears in one of those arrays. So for instance, if the game had "draw 3 cards" as the highest draw effect possible, and we had one card with "draw 1 card" in our draw pile of 5 cards, we'd pass 1/15, assuming the other cards didn't draw at all. Does that sound reasonable? submitted by /u/lawrieee [link] [comments]  ( 9 min )
    Learning ANN
    Hello, I am a finance graduate student and want to learn how I can use ANN for volatility analysis of three stock market index for my homework. Could you please kindly advise some books or videos and tips? I want to learn, thanks in advances 🙌🏻 submitted by /u/here-for-reading [link] [comments]  ( 8 min )
    Please help: Getting an Value Error from using the mean squared error loss function
    I am training a feed-forward neural network that takes in the input of shape (4040,2) and output of shape (4040,4,51). I tried using the Panda data frame for it first, however, it does not seem to work with a 3-dimensional output shape. (If there is a way to covert 3-dimensional data into Panda data frame please let me know) So for now I am using numpy array to store the inputs and outputs. But they are now the same shape, and will raise and error when going through the mean squared error loss function. Any comments, solutions and suggestions are greatly appreciated! The code I have is the following: #using functional API def build_model90(): input_layer = Input(shape = repeated_norm_train_dime.shape) dense_1 = Dense(units='1024', activation='relu')(input_layer) dense_2 = Dense(units=…  ( 9 min )
    CNN & imbalanced data
    Hi, I've been trying to implement a CNN1d algorithm for binary classification on a dataset but it's heavily imbalanced (roughly 518 0s and 90 1s). I've been using cross validation to evaluate performance and I always seem to hit around ~85% accuracy but like 20ish for recall and precision. Are there any suggestions? I've tried SMOTE on training data after forming the x_train and such within the kfold loop, but doesn't lead to much success. The only time I've been able to reach high figures for recall and precision is when I've used SMOTE on the whole dataset by 'smt.fit_resample(X, Y)' but I guess that's causing data leakage. Thanks in advance. submitted by /u/acr15555 [link] [comments]  ( 8 min )
  • Open

    minimax for imperfect-information turn-games?
    How does one implement minimax strategies in Poker or similar imperfect-information turn-games? If you try to minimize your losses, you always have to assume the other agent has better cards than you, and you'd end up always folding, right? (well unless you do have the best hand possible but that would still lead to a sub-optimal over cautious strategy) ​ Does that mean minimax is only for perfect information turn-games? submitted by /u/Lindayz [link] [comments]  ( 8 min )
    RL for satisfying constraint for some percentage of times?
    Hello, I was wondering if there are RL algos or methods where I can have the agent satisfy a constraint a certain percentage of times? I am using RL for optimal control and as of now, I pass to the agent a mask that limits the actions, but I don't need the constraint to be strict and I would actually prefer the agent to have some more freedom to choose and hopefully it can find more space for optimization. Any resources would be great thanks submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    I want to build a learning agent for a combinatorial game
    I am working on building a learning agent for a combinatorial game called Clobber. We are currently using a technique by associating belief values with moves and doing some weighted average, it works well but I want to delve into some other techniques which may help improve the agent (say, work on larger board size). How and where should I make the enviornment? What sort of algorithms should I use? submitted by /u/eoliankeeper [link] [comments]  ( 8 min )
    RL and cognitive science
    Hello everyone, I’m looking for labs in Europe that does RL combine with neuro/ cognitive science. I know some labs in France doing that but not elsewhere. It’s for a PhD. Do you have any clues? Would greatly appreciate it submitted by /u/Jeffrosslostson [link] [comments]  ( 8 min )
    How to add values on a two-dimensional box in openai gym?
    Hello,I have an openAI gym observation box method with dimension (2,300) std::vector shape = { nWifi, history_length, }; Ptr> box = CreateObject>(shape); I want to add values at indexes 0 and 1 of the observation box. How can I do it? Right now, what I'm doing is the following box->AddValue(history[sta_id][i]), that is, adding values only on index 0 of the box, but I want to add values at indexes sta_id 0 and sta_id 1 submitted by /u/Status_Morning_3000 [link] [comments]  ( 8 min )
  • Open

    Numbers don’t typically have many prime factors
    Suppose you select a 100-digit number at random. How many distinct prime factors do you think it would have? The answer is smaller than you might think: most likely between 5 and 6. The function ω(n) returns the number of distinct prime factors of n [1]. A theorem of Hardy and Ramanujan says that as […] Numbers don’t typically have many prime factors first appeared on John D. Cook.  ( 5 min )
  • Open

    An algorithm for checking for palindrome numbers
    IntroductionChecking whether a number is a palindrome can be a common problem in programming, and there are different approaches to solving it. In this article, we will discuss an algorithm for checking if a number is a palindrome number. We will explain the steps involved in the algorithm, provide examples to illustrate it, and explore… Read More »An algorithm for checking for palindrome numbers The post An algorithm for checking for palindrome numbers appeared first on Data Science Central.  ( 21 min )

  • Open

    Research Focus: Week of June 19, 2023
    In this issue: Our new Responsible AI Maturity Model; FoundWright helps “re-find” web pages; Trace-guided Inductive Synthesis of Recursive Functional Programs; a wait-free algorithm for weak reference counting; and new research on concurrency testing. The post Research Focus: Week of June 19, 2023 appeared first on Microsoft Research.  ( 11 min )
  • Open

    Projected $200,000 in Prizes AI Hackathon (Free to Enter)
    Hey everyone, We're hosting a free-to-enter, virtual hackathon, All Inclusive Hacks, centered on fostering creative solutions to strengthen inclusivity in AI development. It will be hosted during mid to late August and we expect the prize pool to be in the range of $200k to $300k and to be sponsored by major tech corporations, including Google, OpenAI(creator of ChatGPT), and Microsoft. This is open to all aged 13+ regardless of experience in programming. We will have workshops to provide educational resources for our participants. Why Should You Join This Hackathon? This hackathon is a great opportunity to... Earn loads of prizes Meet peers with similar interests Improve one's resume for jobs, internships, and college or grad school admissions Improve one's grasp of artificial intelligence(this especially important due to the increasing prevalence of AI in the industry) Learn from our workshops that will be hosted by leading figures in artificial intelligence Sign Up Link: https://all-inclusive-hacks.devpost.com/ submitted by /u/AllInclusiveHacks [link] [comments]  ( 8 min )
    Any Deep ReLU Network is Shallow
    submitted by /u/nickb [link] [comments]  ( 8 min )
    What Is a Transformer Model?
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Mr kitty - After Dark [Neural Net v27.00]
    submitted by /u/thebeardedmojo [link] [comments]  ( 8 min )
  • Open

    Accelerate time to business insights with the Amazon SageMaker Data Wrangler direct connection to Snowflake
    Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code. SageMaker Data Wrangler supports Snowflake, a popular […]  ( 12 min )
    Deploy a serverless ML inference endpoint of large language models using FastAPI, AWS Lambda, and AWS CDK
    For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, […]  ( 10 min )
  • Open

    Masters thesis about improving an implementation
    I'm a masters student researching RL for a niche application. At the start of my program, I encountered a paper that was very interesting to me, and I was determined to expand on the work. I quickly found that their code was challenging to understand, because code quality was not their focus. It uses messy TF 1.x, often reinvents the wheel, and it is not optimized for performance. I ended up re-implementing the code from scratch, trying my best to follow community standards and good coding practices, which was a painstaking yet insightful journey. I made various improvements, e.g. I was able to decrease training time from a few days to a couple of hours. In my opinion, what I've done can be helpful for other researchers in this domain. However, my advisor is skeptical that my contributions are novel enough for the scientific community, especially since he would like me to eventually publish a paper. I see where he's coming from, as this is more so engineering than science. His background is in OR, not ML/RL, so I'm interested to hear some of your thoughts, r/reinforcementlearning. Is this kind of work useful? What's a good way for me to package it? Cheers! submitted by /u/-gold-panda- [link] [comments]  ( 8 min )
    PPO ignores high rewards in deterministic sytem
    Hello, ​ Please excuse my naivete and the fact I dont really seem to understand hw on-policy learning works. Check this training out (x-episodes, y axis cumulative reward max is 400) One thing that I see is that critic may be overfitting to strongly underestimate state values? ​ https://preview.redd.it/vy14cw5deq7b1.png?width=190&format=png&auto=webp&s=20090c3ff2fcad0538f5b6fee3f08e21fa41b284 ​ You see - my environment/system is deterministic (and I have exploration policy in the beginning) yet I cant get my agent to reproduce the steps it did to receive high rewards in the beginning instead it gets locked in sub-optimal reward behaviour, here around ~0 but sometimes it can be in the middle, it never reproduces high-reward actions. ​ I tried multiplying loss with the value of reward or changing hyperparameters, learning rate - this just seems to get stuck even quicker. How can I modify my PPO to be less oblivious to this high reward episodes? I calculate my loss like this: ratios = torch.exp(policy - policy_old) surr1 = ratios * advantages surr2 = torch.clamp(ratios, 1.0 - clip_epsilon, 1.0 + clip_epsilon) * advantages actor_loss = -torch.min(surr1, surr2).mean() Where advantages is tensor with discount advantages for each step submitted by /u/Vae94 [link] [comments]  ( 9 min )
  • Open

    Am I crazy or does this look AI generated?
    submitted by /u/Mayo4Life [link] [comments]  ( 8 min )
    AI chatbot specially designed for 'counter-trolling' or 'counter-cyberbullying'?
    I am sure this has been discussed for a while but has anyone actually deployed such thing? So basically the AI is smart enought to analyze the trolls and bullies then perform counterattacks to them. Well, it doesn't just analyze the posting history of them. It also analyzes the visual information of them. It is also capable of scan the web for relevant information about them to use it against them. I am not sure if this is within legal boundary though. ​ submitted by /u/Absolute-Nobody0079 [link] [comments]  ( 8 min )
    🎨 Midjourney v5.2 is truly mind0-blowing.
    Midjourney, an AI image generator, has released its latest version 5.2 with exciting new features and is mind-blowing. The "Zoom Out" feature allows users to expand the field of view, with options to zoom out by 1.5x and 2x. Additionally, the update includes a "Make Square" option and a "Custom Zoom" tool for altering text prompts and aspect ratios. The new version also introduces a "shorten command" option, enabling users to analyze the impact of words on the generated image. In tests, Midjourney performed well with zooming out but had difficulties with changing aspect ratios. Midjourney v5.2 is accessible through a Discord channel and pricing plans. Worth playing with it. check it out: yarocelis.org submitted by /u/Yavero [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights Stability AI has announced SDXL 0.9, a significant upgrade to their text-to-image model suite that can generate hyper-realistic images. SDXL 0.9 has one of the largest parameter counts in open-source image models (3.5B) and is available on the Clipdrop by Stability AI platform [Details]. Google presents AudioPaLM, a Large Language Model that can speak and listen. AudioPaLM fuses text-based PaLM-2 and speech-based AudioLM models into a unified multimodal architecture that can process and generate text and speech [Examples | paper]. Google researchers present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions [Details]. Meta intro…  ( 10 min )
    Intel Discloses New Details On Meteor Lake VPU Block, Lays Out Vision For Client AI
    submitted by /u/eberkut [link] [comments]  ( 8 min )
    Any good AI for presentation generation?
    I have tried 15 different AI tools yesterday such as tome or magicwizard but all of them are "ok" at best. None can make adjustments to the generated presentation with prompts, it has to be done manually which defies the purpose. So, do any of you know of a good AI that makes great presentations and can make adjustments based on prompts? Thank you! Edit: before anyone suggests it, yes i tried magicslides and it's not that great. Also it would be great if you can give the AI documents like pdf's and it makes presentations from that content. submitted by /u/Rule34withRule16 [link] [comments]  ( 8 min )
    How AI can distort human beliefs
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Any current AI Scientists that I can ask a few questions to?
    I am a career changer from London transitioning into AI (from econ and math). I am planning to do a masters in the states and I keep hearing the US job market is a dream compared to the UK one. As well as other stuff which I dont know how to verify since I am an outsider both in terms of country and industry. Are there any current AI scientists who would be willing to answer a few questions? I keep hearing the US is a much better job and experience market. Is this true? On a similar note - any specific uni or graduate degree or masters that is particularly respected in the industry? For example - Columbia MFE is really good for entering into buy side finance (an analogy I am familiar with because of my econ background), but I am not sure if there are similar target unis for tech/ ai/ FAANG too? My plan is to become an AI scientist (mainly on the corporate side) and thats what I am working towards. Cheers to anyone who responds, you have my blessings and I will treat you to food if we ever meetup! submitted by /u/Icy-Bid-5585 [link] [comments]  ( 8 min )
    China: Artificial Intelligence Searches for Rare Earths
    Looks like artificial intelligence also helps with mining. Interesting to see, maybe also in oil and gas? 🤔 from what I know from oil workers in my family is that the search part takes ages while the extracting part is rather simple submitted by /u/easyskinseasylife [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/22/2023
    DeepMind latest paper introduces a self-improving AI agent for robotics, RoboCat, that learns to perform a variety of tasks across different arms, and then self-generates new training data to improve its technique.[1] OpenAI has lobbied for significant elements of the most comprehensive AI legislation in the world—the E.U.’s AI Act—to be watered down in ways that would reduce the regulatory burden on the company.[2] In an apparent bid to assert its presence in the rapidly expanding AI landscape, Amazon Web Services (AWS)—the retail giant’s sizable cloud computing arm—has introduced a fund of $100 million to bolster startups focusing on generative AI.[3] Over the past year, more than 100,000 login credentials to the popular artificial intelligence chatbot ChatGPT have been leaked and traded on the dark web, according to a Singaporean cybersecurity firm.[4] Sources: [1] https://www.deepmind.com/blog/robocat-a-self-improving-robotic-agent [2] https://time.com/6288245/openai-eu-lobbying-ai-act/ [3] https://decrypt.co/145879/amazon-pledges-100-million-generative-ai-startups [4] https://cointelegraph.com/news/dark-web-chatgpt-logins-leaked-group-ib-open-ai submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
  • Open

    Preference learning with automated feedback for cache eviction
    Posted by Ramki Gummadi, Software Engineer, Google and Kevin Chen, Software Engineer, YouTube Caching is a ubiquitous idea in computer science that significantly improves the performance of storage and retrieval systems by storing a subset of popular items closer to the client based on request patterns. An important algorithmic piece of cache management is the decision policy used for dynamically updating the set of items being stored, which has been extensively optimized over several decades, resulting in several efficient and robust heuristics. While applying machine learning to cache policies has shown promising results in recent years (e.g., LRB, LHD, storage applications), it remains a challenge to outperform robust heuristics in a way that can generalize reliably beyond benchmar…  ( 93 min )
  • Open

    Every factorial is a power
    The previous post mentioned that 24! ≈ 1024 and 25! ≈ 1025. For every n, there is some base b such that n! = bn. For example, 30! ≈ 1230. It’s easy to find b: What’s interesting is that b is very nearly a linear function of n. In hindsight it’s clear that this should […] Every factorial is a power first appeared on John D. Cook.  ( 5 min )
  • Open

    Deep Learning Digs Deep: AI Unveils New Large-Scale Images in Peruvian Desert
    Researchers at Yamagata University in Japan have harnessed AI to uncover four previously unseen geoglyphs — images on the ground, some as wide as 1,200 feet, made using the land’s elements — in Nazca, a seven-hour drive south of Lima, Peru. The geoglyphs — a humanoid, a pair of legs, a fish and a bird Read article >  ( 4 min )
  • Open

    scikit-fda: A Python Package for Functional Data Analysis. (arXiv:2211.02566v2 [stat.CO] UPDATED)
    The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.
    Decentralized Online Federated G-Network Learning for Lightweight Intrusion Detection. (arXiv:2306.13029v1 [cs.CR])
    Cyberattacks are increasingly threatening networked systems, often with the emergence of new types of unknown (zero-day) attacks and the rise of vulnerable devices. While Machine Learning (ML)-based Intrusion Detection Systems (IDSs) have been shown to be extremely promising in detecting these attacks, the need to learn large amounts of labelled data often limits the applicability of ML-based IDSs to cybersystems that only have access to private local data. To address this issue, this paper proposes a novel Decentralized and Online Federated Learning Intrusion Detection (DOF-ID) architecture. DOF-ID is a collaborative learning system that allows each IDS used for a cybersystem to learn from experience gained in other cybersystems in addition to its own local data without violating the data privacy of other systems. As the performance evaluation results using public Kitsune and Bot-IoT datasets show, DOF-ID significantly improves the intrusion detection performance in all collaborating nodes simultaneously with acceptable computation time for online learning.
    Explainable Representations for Relation Prediction in Knowledge Graphs. (arXiv:2306.12687v1 [cs.LG])
    Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.
    Frouros: A Python library for drift detection in machine learning systems. (arXiv:2208.06868v2 [cs.LG] UPDATED)
    Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection: both concept and data drift. We have designed it with the objective of making it compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following a set of best development and continuous integration practices to ensure ease of maintenance and extensibility. The source code is available at https://github.com/IFCA/frouros.
    SQ Lower Bounds for Learning Bounded Covariance GMMs. (arXiv:2306.13057v1 [cs.LG])
    We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$, where $\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$ for some $\epsilon>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/\epsilon)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{\Omega(1/\epsilon)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.
    MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge. (arXiv:2306.12830v1 [cs.LG])
    Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.
    Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model. (arXiv:2306.12968v1 [cs.SI])
    We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.
    Corrector Operator to Enhance Accuracy and Reliability of Neural Operator Surrogates of Nonlinear Variational Boundary-Value Problems. (arXiv:2306.12047v2 [math.NA] UPDATED)
    This work focuses on developing methods for approximating the solution operators of a class of parametric partial differential equations via neural operators. Neural operators have several challenges, including the issue of generating appropriate training data, cost-accuracy trade-offs, and nontrivial hyperparameter tuning. The unpredictability of the accuracy of neural operators impacts their applications in downstream problems of inference, optimization, and control. A framework is proposed based on the linear variational problem that gives the correction to the prediction furnished by neural operators. The operator associated with the corrector problem is referred to as the corrector operator. Numerical results involving a nonlinear diffusion model in two dimensions with PCANet-type neural operators show almost two orders of increase in the accuracy of approximations when neural operators are corrected using the proposed scheme. Further, topology optimization involving a nonlinear diffusion model is considered to highlight the limitations of neural operators and the efficacy of the correction scheme. Optimizers with neural operator surrogates are seen to make significant errors (as high as 80 percent). However, the errors are much lower (below 7 percent) when neural operators are corrected following the proposed method.
    Fitted Value Iteration Methods for Bicausal Optimal Transport. (arXiv:2306.12658v1 [stat.ML])
    We develop a fitted value iteration (FVI) method to compute bicausal optimal transport (OT) where couplings have an adapted structure. Based on the dynamic programming formulation, FVI adopts a function class to approximate the value functions in bicausal OT. Under the concentrability condition and approximate completeness assumption, we prove the sample complexity using (local) Rademacher complexity. Furthermore, we demonstrate that multilayer neural networks with appropriate structures satisfy the crucial assumptions required in sample complexity proofs. Numerical experiments reveal that FVI outperforms linear programming and adapted Sinkhorn methods in scalability as the time horizon increases, while still maintaining acceptable accuracy.
    Quantum Pufferfish Privacy: A Flexible Privacy Framework for Quantum Systems. (arXiv:2306.13054v1 [quant-ph])
    We propose a versatile privacy framework for quantum systems, termed quantum pufferfish privacy (QPP). Inspired by classical pufferfish privacy, our formulation generalizes and addresses limitations of quantum differential privacy by offering flexibility in specifying private information, feasible measurements, and domain knowledge. We show that QPP can be equivalently formulated in terms of the Datta-Leditzky information spectrum divergence, thus providing the first operational interpretation thereof. We reformulate this divergence as a semi-definite program and derive several properties of it, which are then used to prove convexity, composability, and post-processing of QPP mechanisms. Parameters that guarantee QPP of the depolarization mechanism are also derived. We analyze the privacy-utility tradeoff of general QPP mechanisms and, again, study the depolarization mechanism as an explicit instance. The QPP framework is then applied to privacy auditing for identifying privacy violations via a hypothesis testing pipeline that leverages quantum algorithms. Connections to quantum fairness and other quantum divergences are also explored and several variants of QPP are examined.
    ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. (arXiv:2305.11554v2 [cs.CL] UPDATED)
    Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted to a predefined set of tools. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations, leading to suboptimal understandings of the tools. Moreover, when there are numerous tools to choose from, in-context learning could completely fail to work. In this paper, we propose an alternative approach, $\textbf{ToolkenGPT}$, which combines the benefits of both sides. Our approach represents each $\underline{tool}$ as a to$\underline{ken}$ ($\textit{toolken}$) and learns an embedding for it, enabling tool calls in the same way as generating a regular word token. Once a toolken is triggered, the LLM is prompted to complete arguments for the tool to execute. ToolkenGPT offers the flexibility to plug in an arbitrary number of tools by expanding the set of toolkens on the fly. In addition, it improves tool use by allowing extensive demonstration data for learning the toolken embeddings. In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. ToolkenGPT demonstrates the promising ability to use relevant tools from a large tool set in complex scenarios.
    Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference. (arXiv:2306.12465v1 [cs.NE])
    Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to harmonize with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposes limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introduces a spiking patch encoding layer to reinforce local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that effectively blends global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model capacity, and simulation steps. An expanded version of our network challenges the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings accentuate the potential of our deep SNN architecture in seamlessly integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells.
    Siamese SIREN: Audio Compression with Implicit Neural Representations. (arXiv:2306.12957v1 [cs.SD])
    Implicit Neural Representations (INRs) have emerged as a promising method for representing diverse data modalities, including 3D shapes, images, and audio. While recent research has demonstrated successful applications of INRs in image and 3D shape compression, their potential for audio compression remains largely unexplored. Motivated by this, we present a preliminary investigation into the use of INRs for audio compression. Our study introduces Siamese SIREN, a novel approach based on the popular SIREN architecture. Our experimental results indicate that Siamese SIREN achieves superior audio reconstruction fidelity while utilizing fewer network parameters compared to previous INR architectures.
    The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement. (arXiv:2306.09633v3 [cs.LG] CROSS LISTED)
    Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and attracted critical media coverage. The Nature paper withheld most inputs needed to produce reported results and some critical steps in the methodology. But two separate evaluations filled in the gaps and demonstrated that Google RL lags behind human designers, behind a well-known algorithm (Simulated Annealing), and also behind generally-available commercial software. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in the conduct, analysis and reporting.
    Text-Driven Foley Sound Generation With Latent Diffusion Model. (arXiv:2306.10359v2 [cs.SD] UPDATED)
    Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale datasets and fine-tuned to this task via transfer learning using the contrastive language-audio pertaining (CLAP) technique. We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model. Hence, we introduce a trainable layer after the encoder to improve the text embedding produced by the encoder. In addition, we further refine the generated waveform by generating multiple candidate audio clips simultaneously and selecting the best one, which is determined in terms of the similarity score between the embedding of the candidate clips and the embedding of the target text label. Using the proposed method, our system ranks ${1}^{st}$ among the systems submitted to DCASE Challenge 2023 Task 7. The results of the ablation studies illustrate that the proposed techniques significantly improve sound generation performance. The codes for implementing the proposed system are available online.
    Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning. (arXiv:2306.12755v1 [cs.LG])
    Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data. Although avoiding the time-consuming online interactions in RL, it poses challenges for out-of-distribution (OOD) state actions and often suffers from data inefficiency for training. Despite many efforts being devoted to addressing OOD state actions, the latter (data inefficiency) receives little attention in offline RL. To address this, this paper proposes the cross-domain offline RL, which assumes offline data incorporate additional source-domain data from varying transition dynamics (environments), and expects it to contribute to the offline data efficiency. To do so, we identify a new challenge of OOD transition dynamics, beyond the common OOD state actions issue, when utilizing cross-domain offline data. Then, we propose our method BOSA, which employs two support-constrained objectives to address the above OOD issues. Through extensive experiments in the cross-domain offline RL setting, we demonstrate BOSA can greatly improve offline data efficiency: using only 10\% of the target data, BOSA could achieve {74.4\%} of the SOTA offline RL performance that uses 100\% of the target data. Additionally, we also show BOSA can be effortlessly plugged into model-based offline RL and noising data augmentation techniques (used for generating source-domain data), which naturally avoids the potential dynamics mismatch between target-domain data and newly generated source-domain data.
    Multi-Task Learning with Loop Specific Attention for CDR Structure Prediction. (arXiv:2306.13045v1 [cs.LG])
    The Complementarity Determining Region (CDR) structure prediction of loops in antibody engineering has gained a lot of attraction by researchers. When designing antibodies, a main challenge is to predict the CDR structure of the H3 loop. Compared with the other CDR loops, that is the H1 and H2 loops, the CDR structure of the H3 loop is more challenging due to its varying length and flexible structure. In this paper, we propose a Multi-task learning model with Loop Specific Attention, namely MLSA. In particular, to the best of our knowledge we are the first to jointly learn the three CDR loops, via a novel multi-task learning strategy. In addition, to account for the structural and functional similarities and differences of the three CDR loops, we propose a loop specific attention mechanism to control the influence of each CDR loop on the training of MLSA. Our experimental evaluation on widely used benchmark data shows that the proposed MLSA method significantly reduces the prediction error of the CDR structure of the H3 loop, by at least 19%, when compared with other baseline strategies. Finally, for reproduction purposes we make the implementation of MLSA publicly available at https://anonymous.4open.science/r/MLSA-2442/.
    Investigating the effect of sub-word segmentation on the performance of transformer language models. (arXiv:2305.05480v2 [cs.CL] UPDATED)
    We would like to explore how morphemes can affect the performance of a language model. We trained GPT-2 and Bert model with StateMorph for both Finnish and Russian, which is a morpheme segmenting algorithm. As a comparison, we also trained a model with BPE and Morfessor. Our preliminary result shows that StateMorph can help the model to converge more efficiently and achieve a better validation score.
    PyKoopman: A Python Package for Data-Driven Approximation of the Koopman Operator. (arXiv:2306.12962v1 [eess.SY])
    PyKoopman is a Python package for the data-driven approximation of the Koopman operator associated with a dynamical system. The Koopman operator is a principled linear embedding of nonlinear dynamics and facilitates the prediction, estimation, and control of strongly nonlinear dynamics using linear systems theory. In particular, PyKoopman provides tools for data-driven system identification for unforced and actuated systems that build on the equation-free dynamic mode decomposition (DMD) and its variants. In this work, we provide a brief description of the mathematical underpinnings of the Koopman operator, an overview and demonstration of the features implemented in PyKoopman (with code examples), practical advice for users, and a list of potential extensions to PyKoopman. Software is available at this http URL
    Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction. (arXiv:2209.01054v2 [cs.MA] UPDATED)
    Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly represent the actual joint-policy of the system of agents leading to high variance gradient estimates that hinder learning. To address this problem, we propose an enhancement tool that accommodates any actor-critic MARL method. Our framework, Performance Enhancing Reinforcement Learning Apparatus (PERLA), introduces a sampling technique of the agents' joint-policy into the critics while the agents train. This leads to TD updates that closely approximate the true expected value under the current joint-policy rather than estimates from a single sample of the joint-action at a given state. This produces low variance and precise estimates of expected returns, minimising the variance in the critic estimators which typically hinders learning. Moreover, as we demonstrate, by eliminating much of the critic variance from the single sampling of the joint policy, PERLA enables CT-DE methods to scale more efficiently with the number of agents. Theoretically, we prove that PERLA reduces variance in value estimates similar to that of decentralised training while maintaining the benefits of centralised training. Empirically, we demonstrate PERLA's superior performance and ability to reduce estimator variance in a range of benchmarks including Multi-agent Mujoco, and StarCraft II Multi-agent Challenge.
    Precision psychiatry: predicting predictability. (arXiv:2306.12462v1 [q-bio.QM])
    Precision psychiatry is an ermerging field that aims to provide individualized approaches to mental health care. Multivariate analysis and machine learning are used to create outcome prediction models based on clinical data such as demographics, symptom assessments, genetic information, and brain imaging. While much emphasis has been placed on technical innovation, the complex and varied nature of mental health presents significant challenges to the successful implementation of these models. From this perspective, I review ten challenges in the field of precision psychiatry, including the need for studies on real-world populations and realistic clinical outcome definitions, consideration of treatment-related factors such as placebo effects and non-adherence to prescriptions. Fairness, prospective validation in comparison to current practice and implementation studies of prediction models are other key issues that are currently understudied. A shift is proposed from retrospective studies based on linear and static concepts of disease towards prospective research that considers the importance of contextual factors and the dynamic and complex nature of mental health.
    Data augmentation for recommender system: A semi-supervised approach using maximum margin matrix factorization. (arXiv:2306.13050v1 [cs.IR])
    Collaborative filtering (CF) has become a popular method for developing recommender systems (RS) where ratings of a user for new items is predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for the rating predictions, which have not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
    \{kappa}HGCN: Tree-likeness Modeling via Continuous and Discrete Curvature Learning. (arXiv:2212.01793v2 [cs.LG] UPDATED)
    The prevalence of tree-like structures, encompassing hierarchical structures and power law distributions, exists extensively in real-world applications, including recommendation systems, ecosystems, financial networks, social networks, etc. Recently, the exploitation of hyperbolic space for tree-likeness modeling has garnered considerable attention owing to its exponential growth volume. Compared to the flat Euclidean space, the curved hyperbolic space provides a more amenable and embeddable room, especially for datasets exhibiting implicit tree-like architectures. However, the intricate nature of real-world tree-like data presents a considerable challenge, as it frequently displays a heterogeneous composition of tree-like, flat, and circular regions. The direct embedding of such heterogeneous structures into a homogeneous embedding space (i.e., hyperbolic space) inevitably leads to heavy distortions. To mitigate the aforementioned shortage, this study endeavors to explore the curvature between discrete structure and continuous learning space, aiming at encoding the message conveyed by the network topology in the learning process, thereby improving tree-likeness modeling. To the end, a curvature-aware hyperbolic graph convolutional neural network, \{kappa}HGCN, is proposed, which utilizes the curvature to guide message passing and improve long-range propagation. Extensive experiments on node classification and link prediction tasks verify the superiority of the proposal as it consistently outperforms various competitive models by a large margin.
    Exploring Antitrust and Platform Power in Generative AI. (arXiv:2306.11342v2 [cs.LG] UPDATED)
    The concentration of power in a few digital technology companies has become a subject of increasing interest in both academic and non-academic discussions. One of the most noteworthy contributions to the debate is Lina Khan's Amazon's Antitrust Paradox. In this work, Khan contends that Amazon has systematically exerted its dominance in online retail to eliminate competitors and subsequently charge above-market prices. This work contributed to Khan's appointment as the chair of the US Federal Trade Commission (FTC), one of the most influential antitrust organizations. Today, several ongoing antitrust lawsuits in the US and Europe involve major technology companies like Apple, Google/Alphabet, and Facebook/Meta. In the realm of generative AI, we are once again witnessing the same companies taking the lead in technological advancements, leaving little room for others to compete. This article examines the market dominance of these corporations in the technology stack behind generative AI from an antitrust law perspective.
    Learnability and Algorithm for Continual Learning. (arXiv:2306.12646v1 [cs.LG])
    This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). CIL learns a sequence of tasks consisting of disjoint sets of concepts or classes. At any time, a single model is built that can be applied to predict/classify test instances of any classes learned thus far without providing any task related information for each test instance. Although many techniques have been proposed for CIL, they are mostly empirical. It has been shown recently that a strong CIL system needs a strong within-task prediction (WP) and a strong out-of-distribution (OOD) detection for each task. However, it is still not known whether CIL is actually learnable. This paper shows that CIL is learnable. Based on the theory, a new CIL algorithm is also proposed. Experimental results demonstrate its effectiveness.
    Evolving Computation Graphs. (arXiv:2306.12943v1 [cs.LG])
    Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a connection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important real-world scenarios that violate this assumption, and this has spurred research into improving GNNs for these cases. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets. Our approach builds on prior theoretical insights linking node degree, high homophily, and inter vs intra-class embedding similarity by rewiring the GNNs' computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges, ultimately improving GNN performance on non-homophilic data as a result. We evaluate ECGs on a diverse set of recently-proposed heterophilous datasets and demonstrate improvements over the relevant baselines. ECG presents a simple, intuitive and elegant approach for improving GNN performance on heterophilic datasets without requiring prior domain knowledge.
    Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning. (arXiv:2306.12964v1 [q-fin.ST])
    In the field of quantitative trading, it is common practice to transform raw historical stock data into indicative signals for the market trend. Such signals are called alpha factors. Alphas in formula forms are more interpretable and thus favored by practitioners concerned with risk. In practice, a set of formulaic alphas is often used together for better modeling precision, so we need to find synergistic formulaic alpha sets that work well together. However, most traditional alpha generators mine alphas one by one separately, overlooking the fact that the alphas would be combined later. In this paper, we propose a new alpha-mining framework that prioritizes mining a synergistic set of alphas, i.e., it directly uses the performance of the downstream combination model to optimize the alpha generator. Our framework also leverages the strong exploratory capabilities of reinforcement learning~(RL) to better explore the vast search space of formulaic alphas. The contribution to the combination models' performance is assigned to be the return used in the RL process, driving the alpha generator to find better alphas that improve upon the current set. Experimental evaluations on real-world stock market data demonstrate both the effectiveness and the efficiency of our framework for stock trend forecasting. The investment simulation results show that our framework is able to achieve higher returns compared to previous approaches.
    Distributed Sparse Regression via Penalization. (arXiv:2111.06530v2 [cs.LG] UPDATED)
    We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
    Complex Preferences for Different Convergent Priors in Discrete Graph Diffusion. (arXiv:2306.02957v2 [cs.LG] UPDATED)
    Diffusion models have achieved state-of-the-art performance in generating many different kinds of data, including images, text, and videos. Despite their success, there has been limited research on how the underlying diffusion process and the final convergent prior can affect generative performance; this research has also been limited to continuous data types and a score-based diffusion framework. To fill this gap, we explore how different discrete diffusion kernels (which converge to different prior distributions) affect the performance of diffusion models for graphs. To this end, we developed a novel formulation of a family of discrete diffusion kernels which are easily adjustable to converge to different Bernoulli priors, and we study the effect of these different kernels on generative performance. We show that the quality of generated graphs is sensitive to the prior used, and that the optimal choice cannot be explained by obvious statistics or metrics, which challenges the intuitions which previous works have suggested.
    PhAST: Physics-Aware, Scalable, and Task-specific GNNs for Accelerated Catalyst Design. (arXiv:2211.12020v3 [cs.LG] UPDATED)
    Mitigating the climate crisis requires a rapid transition towards lower-carbon energy. Catalyst materials play a crucial role in the electrochemical reactions involved in numerous industrial processes key to this transition, such as renewable energy storage and electrofuel synthesis. To reduce the energy spent on such activities, we must quickly discover more efficient catalysts to drive electrochemical reactions. Machine learning (ML) holds the potential to efficiently model materials properties from large amounts of data, accelerating electrocatalyst design. The Open Catalyst Project OC20 dataset was constructed to that end. However, ML models trained on OC20 are still neither scalable nor accurate enough for practical applications. In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions and evaluate them thoroughly on multiple architectures. Overall, our proposed PhAST improvements increase energy MAE by 4 to 42$\%$ while dividing compute time by 3 to 8$\times$ depending on the targeted task/model. PhAST also enables CPU training, leading to 40$\times$ speedups in highly parallelized settings. Python package: \url{https://phast.readthedocs.io}.
    A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization. (arXiv:2302.09766v2 [math.OC] UPDATED)
    We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at https://github.com/xuxingc/ProxDASA.
    Evaluating Online Bandit Exploration In Large-Scale Recommender System. (arXiv:2304.02572v2 [cs.IR] UPDATED)
    Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. One major bottleneck is how to test the effectiveness of bandit algorithm with fairness and without data leakage. Different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior may induce unfair evaluation in a classic A/B test setting. In this work, we apply upper confidence bound (UCB) to our large scale short video recommender system and present a test framework for the production bandit learning life-cycle with a new set of metrics. Extensive experiment results show that our experiment design is able to fairly evaluate the performance of bandit learning in the recommender system.
    Interpretation of immunofluorescence slides by deep learning techniques: anti-nuclear antibodies case study. (arXiv:2306.12432v1 [q-bio.QM])
    Nowadays, diseases are increasing in numbers and severity by the hour. Immunity diseases, affecting 8\% of the world population in 2017 according to the World Health Organization (WHO), is a field in medicine worth attention due to the high rate of disease occurrence classified under this category. This work presents an up-to-date review of state-of-the-art immune diseases healthcare solutions. We focus on tackling the issue with modern solutions such as Deep Learning to detect anomalies in the early stages hence providing health practitioners with efficient tools. We rely on advanced deep learning techniques such as Convolutional Neural Networks (CNN) to fulfill our objective of providing an efficient tool while providing a proficient analysis of this solution. The proposed solution was tested and evaluated by the immunology department in the Principal Military Hospital of Instruction of Tunis, which considered it a very helpful tool.
    Quilt-1M: One Million Image-Text Pairs for Histopathology. (arXiv:2306.11207v2 [cs.CV] UPDATED)
    Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of analogous data in the medical field, specifically in histopathology, has halted comparable progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering $1,087$ hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of $768,826$ image and text pairs. Quilt was automatically curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around $200$K samples. We combine Quilt with datasets from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with $1$M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new histopathology images across $13$ diverse patch-level datasets of $8$ different sub-pathologies and cross-modal retrieval tasks.
    Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping. (arXiv:2306.12981v1 [cs.LG])
    Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.
    Classification Tree Pruning Under Covariate Shift. (arXiv:2305.04335v2 [stat.ML] UPDATED)
    We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.
    Federated Few-shot Learning. (arXiv:2306.10234v2 [cs.LG] UPDATED)
    Federated Learning (FL) enables multiple clients to collaboratively learn a machine learning model without exchanging their own local data. In this way, the server can exploit the computational power of all clients and train the model on a larger set of data samples among all clients. Although such a mechanism is proven to be effective in various fields, existing works generally assume that each client preserves sufficient data for training. In practice, however, certain clients may only contain a limited number of samples (i.e., few-shot samples). For example, the available photo data taken by a specific user with a new mobile device is relatively rare. In this scenario, existing FL efforts typically encounter a significant performance drop on these clients. Therefore, it is urgent to develop a few-shot model that can generalize to clients with limited data under the FL scenario. In this paper, we refer to this novel problem as federated few-shot learning. Nevertheless, the problem remains challenging due to two major reasons: the global data variance among clients (i.e., the difference in data distributions among clients) and the local data insufficiency in each client (i.e., the lack of adequate local data for training). To overcome these two challenges, we propose a novel federated few-shot learning framework with two separately updated models and dedicated training strategies to reduce the adverse impact of global data variance and local data insufficiency. Extensive experiments on four prevalent datasets that cover news articles and images validate the effectiveness of our framework compared with the state-of-the-art baselines. Our code is provided at https://github.com/SongW-SW/F2L.
    Sharp analysis of EM for learning mixtures of pairwise differences. (arXiv:2302.10066v2 [math.ST] UPDATED)
    We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
    VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations. (arXiv:2207.00221v2 [cs.CV] UPDATED)
    Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-modal downstream tasks. Most existing works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream task accuracy provides little information about the pros and cons of each VLP method, let alone provides insights on how the community can improve the systems in the future. Inspired by the CheckList for testing natural language processing, we exploit VL-CheckList, a novel framework to understand the capabilities of VLP models. The proposed method divides the image-texting ability of a VLP model into three categories: objects, attributes, and relations, and uses a novel taxonomy to further break down these three aspects. We conduct comprehensive studies to analyze seven recently popular VLP models via the proposed framework. Results confirm the effectiveness of the proposed method by revealing fine-grained differences among the compared models that were not visible from downstream task-only evaluation. Further results show promising research direction in building better VLP models. Our data and code are available at: https://github.com/om-ai-lab/VL-CheckList.
    Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms. (arXiv:2306.12383v2 [cs.LG] UPDATED)
    In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
    Evading Forensic Classifiers with Attribute-Conditioned Adversarial Faces. (arXiv:2306.13091v1 [cs.CV])
    The ability of generative models to produce highly realistic synthetic face images has raised security and ethical concerns. As a first line of defense against such fake faces, deep learning based forensic classifiers have been developed. While these forensic models can detect whether a face image is synthetic or real with high accuracy, they are also vulnerable to adversarial attacks. Although such attacks can be highly successful in evading detection by forensic classifiers, they introduce visible noise patterns that are detectable through careful human scrutiny. Additionally, these attacks assume access to the target model(s) which may not always be true. Attempts have been made to directly perturb the latent space of GANs to produce adversarial fake faces that can circumvent forensic classifiers. In this work, we go one step further and show that it is possible to successfully generate adversarial fake faces with a specified set of attributes (e.g., hair color, eye size, race, gender, etc.). To achieve this goal, we leverage the state-of-the-art generative model StyleGAN with disentangled representations, which enables a range of modifications without leaving the manifold of natural images. We propose a framework to search for adversarial latent codes within the feature space of StyleGAN, where the search can be guided either by a text prompt or a reference image. We also propose a meta-learning based optimization strategy to achieve transferable performance on unknown target models. Extensive experiments demonstrate that the proposed approach can produce semantically manipulated adversarial fake faces, which are true to the specified attribute set and can successfully fool forensic face classifiers, while remaining undetectable by humans. Code: https://github.com/koushiksrivats/face_attribute_attack.
    Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields. (arXiv:2306.12760v1 [cs.CV])
    Editing a local region or a specific object in a 3D scene represented by a NeRF is challenging, mainly due to the implicit nature of the scene representation. Consistently blending a new realistic object into the scene adds an additional level of difficulty. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts or image patches, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt or image patch, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and seamlessly blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
    Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling. (arXiv:2007.12674v2 [stat.ME] UPDATED)
    Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.
    On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective. (arXiv:2206.06854v2 [cs.AI] UPDATED)
    Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations. However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learnt with the dual loss of an optimal transportation problem, exhibit desirable XAI properties: They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet. To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient , i.e. the Saliency Map, to the transportation plan direction. These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.
    Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective. (arXiv:2306.13092v1 [cs.CV])
    We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for effective dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution training, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also outperforms MTT by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster in speed with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://zeyuanyin.github.io/projects/SRe2L/.
    Decentralized Multi-Agent Reinforcement Learning with Global State Prediction. (arXiv:2306.12926v1 [cs.RO])
    Deep reinforcement learning (DRL) has seen remarkable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to forma a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to the goal within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.
    Triggering Dark Showers with Conditional Dual Auto-Encoders. (arXiv:2306.12955v1 [hep-ex])
    Auto-encoders (AEs) have the potential to be effective and generic tools for new physics searches at colliders, requiring little to no model-dependent assumptions. New hypothetical physics signals can be considered anomalies that deviate from the well-known background processes generally expected to describe the whole dataset. We present a search formulated as an anomaly detection (AD) problem, using an AE to define a criterion to decide about the physics nature of an event. In this work, we perform an AD search for manifestations of a dark version of strong force using raw detector images, which are large and very sparse, without leveraging any physics-based pre-processing or assumption on the signals. We propose a dual-encoder design which can learn a compact latent space through conditioning. In the context of multiple AD metrics, we present a clear improvement over competitive baselines and prior approaches. It is the first time that an AE is shown to exhibit excellent discrimination against multiple dark shower models, illustrating the suitability of this method as a performant, model-independent algorithm to deploy, e.g., in the trigger stage of LHC experiments such as ATLAS and CMS.
    A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd. (arXiv:2301.08563v2 [cs.HC] UPDATED)
    The recruitment of trustworthy and high-quality workers is an important research issue for MCS. Previous studies either assume that the qualities of workers are known in advance, or assume that the platform knows the qualities of workers once it receives their collected data. In reality, to reduce costs and thus maximize revenue, many strategic workers do not perform their sensing tasks honestly and report fake data to the platform, which is called False data attacks. And it is very hard for the platform to evaluate the authenticity of the received data. In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS. First, we model the worker recruitment as a multi-armed bandit reverse auction problem and design an UCB-based algorithm to separate the exploration and exploitation, regarding the Sensing Rates (SRs) of recruited workers as the gain of the bandit. Next, a Semi-supervised Sensing Rate Learning (SSRL) approach is proposed to quickly and accurately obtain the workers' SRs, which consists of two phases, supervision and self-supervision. Last, SCMABA is designed organically combining the SRs acquisition mechanism with multi-armed bandit reverse auction, where supervised SR learning is used in the exploration, and the self-supervised one is used in the exploitation. We theoretically prove that our SCMABA achieves truthfulness and individual rationality and exhibits outstanding performances of the SCMABA mechanism through in-depth simulations of real-world data traces.
    Auditing Predictive Models for Intersectional Biases. (arXiv:2306.13064v1 [cs.LG])
    Predictive models that satisfy group fairness criteria in aggregate for members of a protected class, but do not guarantee subgroup fairness, could produce biased predictions for individuals at the intersection of two or more protected classes. To address this risk, we propose Conditional Bias Scan (CBS), a flexible auditing framework for detecting intersectional biases in classification models. CBS identifies the subgroup for which there is the most significant bias against the protected class, as compared to the equivalent subgroup in the non-protected class, and can incorporate multiple commonly used fairness definitions for both probabilistic and binarized predictions. We show that this methodology can detect previously unidentified intersectional and contextual biases in the COMPAS pre-trial risk assessment tool and has higher bias detection power compared to similar methods that audit for subgroup fairness.
    Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots. (arXiv:2306.12898v1 [cond-mat.mes-hall])
    Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for developing various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Meanwhile, reflective high-energy electron diffraction (RHEED) has been widely used to capture a wealth of growth information in situ. However, it still faces the challenges of extracting information from noisy and overlapping images. Here, based on 3D ResNet, we developed a machine learning (ML) model specially designed for training RHEED videos instead of static images and providing real-time feedback on surface morphologies for process control. We demonstrated that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5E10 cm-2 down to 3.8E8 cm-2 or up to 1.4 E11 cm-2. Compared to traditional methods, our approach, with in-situ tuning capabilities and excellent reliability, can dramatically expedite the material optimization process and improve the reproducibility of MBE growth, constituting significant progress for thin film growth techniques. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for microelectronic and optoelectronic industries.
    Approximation Algorithms for Fair Range Clustering. (arXiv:2306.06778v2 [cs.LG] UPDATED)
    This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one of the $\ell$ different demographics (i.e., $P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell$) and a set of $\ell$ intervals $[\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell]$ on desired number of centers from each group, the goal is to pick a set of $k$ centers $C$ with minimum $\ell_p$-clustering cost (i.e., $(\sum_{v\in P} d(v,C)^p)^{1/p}$) such that for each group $i\in \ell$, $|C\cap P_i| \in [\alpha_i, \beta_i]$. In particular, the fair range $\ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$.
    Inferring the finest pattern of mutual independence from data. (arXiv:2306.12984v1 [stat.ML])
    For a random variable $X$, we are interested in the blind extraction of its finest mutual independence pattern $\mu ( X )$. We introduce a specific kind of independence that we call dichotomic. If $\Delta ( X )$ stands for the set of all patterns of dichotomic independence that hold for $X$, we show that $\mu ( X )$ can be obtained as the intersection of all elements of $\Delta ( X )$. We then propose a method to estimate $\Delta ( X )$ when the data are independent and identically (i.i.d.) realizations of a multivariate normal distribution. If $\hat{\Delta} ( X )$ is the estimated set of valid patterns of dichotomic independence, we estimate $\mu ( X )$ as the intersection of all patterns of $\hat{\Delta} ( X )$. The method is tested on simulated data, showing its advantages and limits. We also consider an application to a toy example as well as to experimental data.
    Learning from Visual Observation via Offline Pretrained State-to-Go Transformer. (arXiv:2306.12860v1 [cs.LG])
    Learning from visual observation (LfVO), aiming at recovering policies from only visual observation data, is promising yet a challenging problem. Existing LfVO approaches either only adopt inefficient online learning schemes or require additional task-specific information like goal states, making them not suited for open-ended tasks. To address these issues, we propose a two-stage framework for learning from visual observation. In the first stage, we introduce and pretrain State-to-Go (STG) Transformer offline to predict and differentiate latent transitions of demonstrations. Subsequently, in the second stage, the STG Transformer provides intrinsic rewards for downstream reinforcement learning tasks where an agent learns merely from intrinsic rewards. Empirical results on Atari and Minecraft show that our proposed method outperforms baselines and in some tasks even achieves performance comparable to the policy learned from environmental rewards. These results shed light on the potential of utilizing video-only data to solve difficult visual reinforcement learning tasks rather than relying on complete offline datasets containing states, actions, and rewards. The project's website and code can be found at https://sites.google.com/view/stgtransformer.
    Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing. (arXiv:2306.12929v1 [cs.LG])
    Transformer models have been widely adopted in various domains over the last years, and especially large language models have advanced the field of AI significantly. Due to their size, the capability of these networks has increased tremendously, but this has come at the cost of a significant increase in necessary compute. Quantization is one of the most effective ways to reduce the computational time and memory consumption of neural networks. Many studies have shown, however, that modern transformer models tend to learn strong outliers in their activations, making them difficult to quantize. To retain acceptable performance, the existence of these outliers requires activations to be in higher bitwidth or the use of different numeric formats, extra fine-tuning, or other workarounds. We show that strong outliers are related to very specific behavior of attention heads that try to learn a "no-op" or just a partial update of the residual. To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism - clipped softmax and gated attention. We empirically show that models pre-trained using our methods learn significantly smaller outliers while maintaining and sometimes even improving the floating-point task performance. This enables us to quantize transformers to full INT8 quantization of the activations without any additional effort. We demonstrate the effectiveness of our methods on both language models (BERT, OPT) and vision transformers.
    Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift. (arXiv:2306.12306v2 [cs.LG] UPDATED)
    Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.
    A Survey of Deep Learning for Mathematical Reasoning. (arXiv:2212.10535v2 [cs.AI] UPDATED)
    Mathematical reasoning is a fundamental aspect of human intelligence and is applicable in various fields, including science, engineering, finance, and everyday life. The development of artificial intelligence (AI) systems capable of solving math problems and proving theorems has garnered significant interest in the fields of machine learning and natural language processing. For example, mathematics serves as a testbed for aspects of reasoning that are challenging for powerful deep learning models, driving new algorithmic and modeling advances. On the other hand, recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning. In this survey paper, we review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade. We also evaluate existing benchmarks and methods, and discuss future research directions in this domain.
    Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model. (arXiv:2306.12867v1 [eess.AS])
    In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).
    SNAP: Efficient Extraction of Private Properties with Poisoning. (arXiv:2208.12348v2 [cs.LG] UPDATED)
    Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such attacks have privacy implications for data owners sharing their datasets to train machine learning models. Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models, which induces a large computational overhead. In this paper, we consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model. Motivated by our theoretical analysis of model confidences under poisoning, we design an efficient property inference attack, SNAP, which obtains higher attack success and requires lower amounts of poisoning than the state-of-the-art poisoning-based property inference attack by Mahloujifar et al. For example, on the Census dataset, SNAP achieves 34% higher success rate than Mahloujifar et al. while being 56.5x faster. We also extend our attack to infer whether a certain property was present at all during training and estimate the exact proportion of a property of interest efficiently. We evaluate our attack on several properties of varying proportions from four datasets and demonstrate SNAP's generality and effectiveness. An open-source implementation of SNAP can be found at https://github.com/johnmath/snap-sp23.
    Improved Financial Forecasting via Quantum Machine Learning. (arXiv:2306.12965v1 [q-fin.ST])
    Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural network architectures with orthogonal and compound layers for credit risk assessment, which match classical performance with significantly fewer parameters. Our results demonstrate that leveraging quantum ideas can effectively enhance the performance of machine learning, both today as quantum-inspired classical ML solutions, and even more in the future, with the advent of better quantum hardware.
    Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting. (arXiv:2306.13085v1 [cs.LG])
    Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any offline RL algorithm. We further analyze that the opportunity for performance improvement over the behavior policy correlates with the positive-sided variance of the returns of the trajectories in the dataset. We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms combined with our reweighted sampling strategy fully exploit the dataset. Furthermore, we empirically demonstrate that, despite its theoretical limitation, the approach may still be efficient in stochastic environments. The code is available at https://github.com/Improbable-AI/harness-offline-rl.
    Can a single image processing algorithm work equally well across all phases of DCE-MRI?. (arXiv:2306.12988v1 [eess.IV])
    Image segmentation and registration are said to be challenging when applied to dynamic contrast enhanced MRI sequences (DCE-MRI). The contrast agent causes rapid changes in intensity in the region of interest and elsewhere, which can lead to false positive predictions for segmentation tasks and confound the image registration similarity metric. While it is widely assumed that contrast changes increase the difficulty of these tasks, to our knowledge no work has quantified these effects. In this paper we examine the effect of training with different ratios of contrast enhanced (CE) data on two popular tasks: segmentation with nnU-Net and Mask R-CNN and registration using VoxelMorph and VTN. We experimented further by strategically using the available datasets through pretraining and fine tuning with different splits of data. We found that to create a generalisable model, pretraining with CE data and fine tuning with non-CE data gave the best result. This interesting find could be expanded to other deep learning based image processing tasks with DCE-MRI and provide significant improvements to the models performance.
    Enhancing variational quantum state diagonalization using reinforcement learning techniques. (arXiv:2306.11086v2 [quant-ph] UPDATED)
    The development of variational quantum algorithms is crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called the variational diagonalization method, which constitutes an important algorithmic subroutine, and it can be used directly for working with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning. To achieve this, we utilize a novel encoding method that can be used to tackle the problem of circuit depth optimization using a reinforcement learning approach. We demonstrate that our approach provides a solid approximation to the diagonalization task while using a small number of gates. The circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm, and thus can be used in situations where the depth of quantum circuits is limited by the hardware capabilities.
    AdCraft: An Advanced Reinforcement Learning Benchmark Environment for Search Engine Marketing Optimization. (arXiv:2306.11971v2 [cs.LG] UPDATED)
    We introduce AdCraft, a novel benchmark environment for the Reinforcement Learning (RL) community distinguished by its stochastic and non-stationary properties. The environment simulates bidding and budgeting dynamics within Search Engine Marketing (SEM), a digital marketing technique utilizing paid advertising to enhance the visibility of websites on search engine results pages (SERPs). The performance of SEM advertisement campaigns depends on several factors, including keyword selection, ad design, bid management, budget adjustments, and performance monitoring. Deep RL recently emerged as a potential strategy to optimize campaign profitability within the complex and dynamic landscape of SEM but it requires substantial data, which may be costly or infeasible to acquire in practice. Our customizable environment enables practitioners to assess and enhance the robustness of RL algorithms pertinent to SEM bid and budget management without such costs. Through a series of experiments within the environment, we demonstrate the challenges imposed by sparsity and non-stationarity on agent convergence and performance. We hope these challenges further encourage discourse and development around effective strategies for managing real-world uncertainties.
    Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations. (arXiv:2109.08776v5 [cs.LG] UPDATED)
    In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the KL divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart.
    Rapid building damage assessment workflow: An implementation for the 2023 Rolling Fork, Mississippi tornado event. (arXiv:2306.12589v1 [cs.CV])
    Rapid and accurate building damage assessments from high-resolution satellite imagery following a natural disaster is essential to inform and optimize first responder efforts. However, performing such building damage assessments in an automated manner is non-trivial due to the challenges posed by variations in disaster-specific damage, diversity in satellite imagery, and the dearth of extensive, labeled datasets. To circumvent these issues, this paper introduces a human-in-the-loop workflow for rapidly training building damage assessment models after a natural disaster. This article details a case study using this workflow, executed in partnership with the American Red Cross during a tornado event in Rolling Fork, Mississippi in March, 2023. The output from our human-in-the-loop modeling process achieved a precision of 0.86 and recall of 0.80 for damaged buildings when compared to ground truth data collected post-disaster. This workflow was implemented end-to-end in under 2 hours per satellite imagery scene, highlighting its potential for real-time deployment.
    A Comparison of Time-based Models for Multimodal Emotion Recognition. (arXiv:2306.13076v1 [cs.LG])
    Emotion recognition has become an important research topic in the field of human-computer interaction. Studies on sound and videos to understand emotions focused mainly on analyzing facial expressions and classified 6 basic emotions. In this study, the performance of different sequence models in multi-modal emotion recognition was compared. The sound and images were first processed by multi-layered CNN models, and the outputs of these models were fed into various sequence models. The sequence model is GRU, Transformer, LSTM and Max Pooling. Accuracy, precision, and F1 Score values of all models were calculated. The multi-modal CREMA-D dataset was used in the experiments. As a result of the comparison of the CREMA-D dataset, GRU-based architecture with 0.640 showed the best result in F1 score, LSTM-based architecture with 0.699 in precision metric, while sensitivity showed the best results over time with Max Pooling-based architecture with 0.620. As a result, it has been observed that the sequence models compare performances close to each other.
    IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. (arXiv:2302.13522v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown high potential for a variety of real-world, challenging applications, but one of the major obstacles in GNN research is the lack of large-scale flexible datasets. Most existing public datasets for GNNs are relatively small, which limits the ability of GNNs to generalize to unseen data. The few existing large-scale graph datasets provide very limited labeled data. This makes it difficult to determine if the GNN model's low accuracy for unseen data is inherently due to insufficient training data or if the model failed to generalize. Additionally, datasets used to train GNNs need to offer flexibility to enable a thorough study of the impact of various factors while training GNN models. In this work, we introduce the Illinois Graph Benchmark (IGB), a research dataset tool that the developers can use to train, scrutinize and systematically evaluate GNN models with high fidelity. IGB includes both homogeneous and heterogeneous academic graphs of enormous sizes, with more than 40% of their nodes labeled. Compared to the largest graph datasets publicly available, the IGB provides over 162X more labeled data for deep learning practitioners and developers to create and evaluate models with higher accuracy. The IGB dataset is a collection of academic graphs designed to be flexible, enabling the study of various GNN architectures, embedding generation techniques, and analyzing system performance issues for node classification tasks. IGB is open-sourced, supports DGL and PyG frameworks, and comes with releases of the raw text that we believe foster emerging language models and GNN research projects. An early public version of IGB is available at https://github.com/IllinoisGraphBenchmark/IGB-Datasets.
    Efficient Partitioning Method of Large-Scale Public Safety Spatio-Temporal Data based on Information Loss Constraints. (arXiv:2306.12857v1 [cs.LG])
    The storage, management, and application of massive spatio-temporal data are widely applied in various practical scenarios, including public safety. However, due to the unique spatio-temporal distribution characteristics of re-al-world data, most existing methods have limitations in terms of the spatio-temporal proximity of data and load balancing in distributed storage. There-fore, this paper proposes an efficient partitioning method of large-scale public safety spatio-temporal data based on information loss constraints (IFL-LSTP). The IFL-LSTP model specifically targets large-scale spatio-temporal point da-ta by combining the spatio-temporal partitioning module (STPM) with the graph partitioning module (GPM). This approach can significantly reduce the scale of data while maintaining the model's accuracy, in order to improve the partitioning efficiency. It can also ensure the load balancing of distributed storage while maintaining spatio-temporal proximity of the data partitioning results. This method provides a new solution for distributed storage of mas-sive spatio-temporal data. The experimental results on multiple real-world da-tasets demonstrate the effectiveness and superiority of IFL-LSTP.
    GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. (arXiv:2306.13089v1 [cs.LG])
    Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.
    Mitigating Discrimination in Insurance with Wasserstein Barycenters. (arXiv:2306.12912v1 [stat.ML])
    The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.
    Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors. (arXiv:2210.04819v2 [cs.NE] UPDATED)
    Data-driven learning based methods have recently been particularly successful at learning robust locomotion controllers for a variety of unstructured terrains. Prior work has shown that incorporating good locomotion priors in the form of trajectory generators (TGs) is effective at efficiently learning complex locomotion skills. However, defining a good, single TG as tasks/environments become increasingly more complex remains a challenging problem as it requires extensive tuning and risks reducing the effectiveness of the prior. In this paper, we present Evolved Environmental Trajectory Generators (EETG), a method that learns a diverse set of specialised locomotion priors using Quality-Diversity algorithms while maintaining a single policy within the Policies Modulating TG (PMTG) architecture. The results demonstrate that EETG enables a quadruped robot to successfully traverse a wide range of environments, such as slopes, stairs, rough terrain, and balance beams. Our experiments show that learning a diverse set of specialized TG priors is significantly (5 times) more efficient than using a single, fixed prior when dealing with a wide range of environments.
    FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. (arXiv:2306.12873v1 [physics.ao-ph])
    Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement.
    DEPAC: a Corpus for Depression and Anxiety Detection from Speech. (arXiv:2306.12443v1 [eess.AS])
    Mental distress like depression and anxiety contribute to the largest proportion of the global burden of diseases. Automated diagnosis systems of such disorders, empowered by recent innovations in Artificial Intelligence, can pave the way to reduce the sufferings of the affected individuals. Development of such systems requires information-rich and balanced corpora. In this work, we introduce a novel mental distress analysis audio dataset DEPAC, labeled based on established thresholds on depression and anxiety standard screening tools. This large dataset comprises multiple speech tasks per individual, as well as relevant demographic information. Alongside, we present a feature set consisting of hand-curated acoustic and linguistic features, which were found effective in identifying signs of mental illnesses in human speech. Finally, we justify the quality and effectiveness of our proposed audio corpus and feature set in predicting depression severity by comparing the performance of baseline machine learning models built on this dataset with baseline models trained on other well-known depression corpora.
    The Power of Linear Combinations: Learning with Random Convolutions. (arXiv:2301.11360v2 [cs.CV] UPDATED)
    Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient $1\times 1$ convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning $3\times 3$ convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.
    Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation. (arXiv:2302.03673v3 [cs.LG] UPDATED)
    We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.
    Sum-Rate Maximization of RSMA-based Aerial Communications with Energy Harvesting: A Reinforcement Learning Approach. (arXiv:2306.12977v1 [cs.IT])
    In this letter, we investigate a joint power and beamforming design problem for rate-splitting multiple access (RSMA)-based aerial communications with energy harvesting, where a self-sustainable aerial base station serves multiple users by utilizing the harvested energy. Considering maximizing the sum-rate from the long-term perspective, we utilize a deep reinforcement learning (DRL) approach, namely the soft actor-critic algorithm, to restrict the maximum transmission power at each time based on the stochastic property of the channel environment, harvested energy, and battery power information. Moreover, for designing precoders and power allocation among all the private/common streams of the RSMA, we employ sequential least squares programming (SLSQP) using the Han-Powell quasi-Newton method to maximize the sum-rate for the given transmission power via DRL. Numerical results show the superiority of the proposed scheme over several baseline methods in terms of the average sum-rate performance.
    Towards Explainable Evaluation Metrics for Machine Translation. (arXiv:2306.13041v1 [cs.CL])
    Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for machine translation (for example, COMET or BERTScore) are based on black-box large language models. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are more transparent. To foster more widespread acceptance of novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties as well as key goals of explainable machine translation metrics and provide a comprehensive synthesis of recent techniques, relating them to our established goals and properties. In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4. Finally, we contribute a vision of next-generation approaches, including natural language explanations. We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent machine translation systems.
    StrainNet: Predicting crystal structure elastic properties using SE(3)-equivariant graph neural networks. (arXiv:2306.12818v1 [cond-mat.dis-nn])
    Accurately predicting the elastic properties of crystalline solids is vital for computational materials science. However, traditional atomistic scale ab initio approaches are computationally intensive, especially for studying complex materials with a large number of atoms in a unit cell. We introduce a novel data-driven approach to efficiently predict the elastic properties of crystal structures using SE(3)-equivariant graph neural networks (GNNs). This approach yields important scalar elastic moduli with the accuracy comparable to recent data-driven studies. Importantly, our symmetry-aware GNNs model also enables the prediction of the strain energy density (SED) and the associated elastic constants, the fundamental tensorial quantities that are significantly influenced by a material's crystallographic group. The model consistently distinguishes independent elements of SED tensors, in accordance with the symmetry of the crystal structures. Finally, our deep learning model possesses meaningful latent features, offering an interpretable prediction of the elastic properties.
    In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD. (arXiv:2306.12900v1 [cs.LG])
    Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. Additionally, performing inference at runtime requires non-trivial coupling of ML framework libraries with simulation codes. This work offers a solution to both limitations by simplifying this coupling and enabling in situ training and inference workflows on heterogeneous clusters. Leveraging SmartSim, the presented framework deploys a database to store data and ML models in memory, thus circumventing the file system. On the Polaris supercomputer, we demonstrate perfect scaling efficiency to the full machine size of the data transfer and inference costs thanks to a novel co-located deployment of the database. Moreover, we train an autoencoder in situ from a turbulent flow simulation, showing that the framework overhead is negligible relative to a solver time step and training epoch.
    Transferable Curricula through Difficulty Conditioned Generators. (arXiv:2306.13028v1 [cs.AI])
    Advancements in reinforcement learning (RL) have demonstrated superhuman performance in complex tasks such as Starcraft, Go, Chess etc. However, knowledge transfer from Artificial "Experts" to humans remain a significant challenge. A promising avenue for such transfer would be the use of curricula. Recent methods in curricula generation focuses on training RL agents efficiently, yet such methods rely on surrogate measures to track student progress, and are not suited for training robots in the real world (or more ambitiously humans). In this paper, we introduce a method named Parameterized Environment Response Model (PERM) that shows promising results in training RL agents in parameterized environments. Inspired by Item Response Theory, PERM seeks to model difficulty of environments and ability of RL agents directly. Given that RL agents and humans are trained more efficiently under the "zone of proximal development", our method generates a curriculum by matching the difficulty of an environment to the current ability of the student. In addition, PERM can be trained offline and does not employ non-stationary measures of student ability, making it suitable for transfer between students. We demonstrate PERM's ability to represent the environment parameter space, and training with RL agents with PERM produces a strong performance in deterministic environments. Lastly, we show that our method is transferable between students, without any sacrifice in training quality.
    Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery. (arXiv:2306.12802v1 [cs.LG])
    Recent research in representation learning utilizes large databases of proteins or molecules to acquire knowledge of drug and protein structures through unsupervised learning techniques. These pre-trained representations have proven to significantly enhance the accuracy of subsequent tasks, such as predicting the affinity between drugs and target proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results on established benchmark datasets. We provide preprocessed and integrated data obtained from 7 public sources, which encompass over 30M triples. Additionally, we make available the pre-trained models based on this data, along with the reported outcomes of their performance on three widely-used benchmark datasets for drug-target binding affinity prediction found in the Therapeutic Data Commons (TDC) benchmarks. Additionally, we make the source code for training models on benchmark datasets publicly available. Our objective in releasing these pre-trained models, accompanied by clean data for model pretraining and benchmark results, is to encourage research in knowledge-enhanced representation learning.
    Towards Antisymmetric Neural Ansatz Separation. (arXiv:2208.03264v3 [cs.LG] UPDATED)
    We study separations between two fundamental models (or \emph{Ans\"atze}) of antisymmetric functions, that is, functions $f$ of the form $f(x_{\sigma(1)}, \ldots, x_{\sigma(N)}) = \text{sign}(\sigma)f(x_1, \ldots, x_N)$, where $\sigma$ is any permutation. These arise in the context of quantum chemistry, and are the basic modeling tool for wavefunctions of Fermionic systems. Specifically, we consider two popular antisymmetric Ans\"atze: the Slater representation, which leverages the alternating structure of determinants, and the Jastrow ansatz, which augments Slater determinants with a product by an arbitrary symmetric function. We construct an antisymmetric function in $N$ dimensions that can be efficiently expressed in Jastrow form, yet provably cannot be approximated by Slater determinants unless there are exponentially (in $N^2$) many terms. This represents the first explicit quantitative separation between these two Ans\"atze.
    The quantum cost function concentration dependency on the parametrization expressivity. (arXiv:2301.06883v2 [quant-ph] UPDATED)
    Although we are currently in the era of noisy intermediate scale quantum devices, several studies are being conducted with the aim of bringing machine learning to the quantum domain. Currently, quantum variational circuits are one of the main strategies used to build such models. However, despite its widespread use, we still do not know what are the minimum resources needed to create a quantum machine learning model. In this article, we analyze how the expressiveness of the parametrization affects the cost function. We analytically show that the more expressive the parametrization is, the more the cost function will tend to concentrate around a value that depends both on the chosen observable and on the number of qubits used. For this, we initially obtain a relationship between the expressiveness of the parametrization and the mean value of the cost function. Afterwards, we relate the expressivity of the parametrization with the variance of the cost function. Finally, we show some numerical simulation results that confirm our theoretical-analytical predictions. To the best of our knowledge, this is the first time that these two important aspects of quantum neural networks are explicitly connected.
    Algorithmic decision making methods for fair credit scoring. (arXiv:2209.07912v3 [cs.LG] UPDATED)
    The effectiveness of machine learning in evaluating the creditworthiness of loan applicants has been demonstrated for a long time. However, there is concern that the use of automated decision-making processes may result in unequal treatment of groups or individuals, potentially leading to discriminatory outcomes. This paper seeks to address this issue by evaluating the effectiveness of 12 leading bias mitigation methods across 5 different fairness metrics, as well as assessing their accuracy and potential profitability for financial institutions. Through our analysis, we have identified the challenges associated with achieving fairness while maintaining accuracy and profitabiliy, and have highlighted both the most successful and least successful mitigation methods. Ultimately, our research serves to bridge the gap between experimental machine learning and its practical applications in the finance industry.
    An Interactive Interface for Novel Class Discovery in Tabular Data. (arXiv:2306.12919v1 [cs.LG])
    Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes. The majority of NCD methods proposed so far only deal with image data, despite tabular data being among the most widely used type of data in practical applications. To interpret the results of clustering or NCD algorithms, data scientists need to understand the domain- and application-specific attributes of tabular data. This task is difficult and can often only be performed by a domain expert. Therefore, this interface allows a domain expert to easily run state-of-the-art algorithms for NCD in tabular data. With minimal knowledge in data science, interpretable results can be generated.
    Adaptive Bernstein Change Detector for High-Dimensional Data Streams. (arXiv:2306.12974v1 [cs.LG])
    Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein's inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by at least 8% and up to 23% in F1-score on average. It can also accurately estimate changes' subspace, together with a severity measure that correlates with the ground truth.
    Context-lumpable stochastic bandits. (arXiv:2306.13053v1 [cs.LG])
    We consider a contextual bandit problem with $S $ contexts and $A $ actions. In each round $t=1,2,\dots$ the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S ,A \}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +A )/\epsilon^2)$ samples with high probability and provide a matching $\widetilde\Omega(r (S +A )/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r^3(S +A )T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.
    Efficient Sensor Placement from Regression with Sparse Gaussian Processes in Continuous and Discrete Spaces. (arXiv:2303.00028v2 [cs.RO] UPDATED)
    We present a novel approach based on sparse Gaussian processes (SGPs) to address the sensor placement problem for monitoring spatially (or spatiotemporally) correlated phenomena such as temperature and precipitation. Existing Gaussian process (GP) based sensor placement approaches use GPs with known kernel function parameters to model a phenomenon and subsequently optimize the sensor locations in a discretized representation of the environment. In our approach, we fit an SGP with known kernel function parameters to randomly sampled unlabeled locations in the environment and show that the learned inducing points of the SGP inherently solve the sensor placement problem in continuous spaces. Using SGPs avoids discretizing the environment and reduces the computation cost from cubic to linear complexity. When restricted to a candidate set of sensor placement locations, we can use greedy sequential selection algorithms on the SGP's optimization bound to find good solutions. We also present an approach to efficiently map our continuous space solutions to discrete solution spaces using the assignment problem, which gives us discrete sensor placements optimized in unison. Moreover, we generalize our approach to model sensors with non-point field-of-view and integrated observations by leveraging the inherent properties of GPs and SGPs. Our experimental results on three real-world datasets show that our approaches generate solution placements that result in reconstruction quality that is consistently on par or better than the prior state-of-the-art approach while being significantly faster. Our computationally efficient approaches will enable both large-scale sensor placement, and fast sensor placement for informative path planning problems.
    Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference. (arXiv:2302.02522v2 [q-bio.PE] UPDATED)
    The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is the modelling of the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.
    Robust Semantic Segmentation: Strong Adversarial Attacks and Fast Training of Robust Models. (arXiv:2306.12941v1 [cs.CV])
    While a large amount of work has focused on designing adversarial attacks against image classifiers, only a few methods exist to attack semantic segmentation models. We show that attacking segmentation models presents task-specific challenges, for which we propose novel solutions. Our final evaluation protocol outperforms existing methods, and shows that those can overestimate the robustness of the models. Additionally, so far adversarial training, the most successful way for obtaining robust image classifiers, could not be successfully applied to semantic segmentation. We argue that this is because the task to be learned is more challenging, and requires significantly higher computational effort than for image classification. As a remedy, we show that by taking advantage of recent advances in robust ImageNet classifiers, one can train adversarially robust segmentation models at limited computational cost by fine-tuning robust backbones.
    A prior regularized full waveform inversion using generative diffusion models. (arXiv:2306.12776v1 [physics.geo-ph])
    Full waveform inversion (FWI) has the potential to provide high-resolution subsurface model estimations. However, due to limitations in observation, e.g., regional noise, limited shots or receivers, and band-limited data, it is hard to obtain the desired high-resolution model with FWI. To address this challenge, we propose a new paradigm for FWI regularized by generative diffusion models. Specifically, we pre-train a diffusion model in a fully unsupervised manner on a prior velocity model distribution that represents our expectations of the subsurface and then adapt it to the seismic observations by incorporating the FWI into the sampling process of the generative diffusion models. What makes diffusion models uniquely appropriate for such an implementation is that the generative process retains the form and dimensions of the velocity model. Numerical examples demonstrate that our method can outperform the conventional FWI with only negligible additional computational cost. Even in cases of very sparse observations or observations with strong noise, the proposed method could still reconstruct a high-quality subsurface model. Thus, we can incorporate our prior expectations of the solutions in an efficient manner. We further test this approach on field data, which demonstrates the effectiveness of the proposed method.
    An efficient and straightforward online quantization method for a data stream through remove-birth updating. (arXiv:2306.12574v1 [cs.LG])
    The growth of network-connected devices is creating an explosion of data, known as big data, and posing significant challenges to efficient data analysis. This data is generated continuously, creating a dynamic flow known as a data stream. The characteristics of a data stream may change dynamically, and this change is known as concept drift. Consequently, a method for handling data streams must efficiently reduce their volume while dynamically adapting to these changing characteristics. This paper proposes a simple online vector quantization method for concept drift. The proposed method identifies and replaces units with low win probability through remove-birth updating, thus achieving a rapid adaptation to concept drift. Furthermore, the results of this study show that the proposed method can generate minimal dead units even in the presence of concept drift. This study also suggests that some metrics calculated from the proposed method will be helpful for drift detection.
    Towards More Realistic Membership Inference Attacks on Large Diffusion Models. (arXiv:2306.12983v1 [cs.LG])
    Generative diffusion models, including Stable Diffusion and Midjourney, can generate visually appealing, diverse, and high-resolution images for various applications. These models are trained on billions of internet-sourced images, raising significant concerns about the potential unauthorized use of copyright-protected images. In this paper, we examine whether it is possible to determine if a specific image was used in the training set, a problem known in the cybersecurity community and referred to as a membership inference attack. Our focus is on Stable Diffusion, and we address the challenge of designing a fair evaluation framework to answer this membership question. We propose a methodology to establish a fair evaluation setup and apply it to Stable Diffusion, enabling potential extensions to other generative models. Utilizing this evaluation setup, we execute membership attacks (both known and newly introduced). Our research reveals that previously proposed evaluation setups do not provide a full understanding of the effectiveness of membership inference attacks. We conclude that the membership inference attack remains a significant challenge for large diffusion models (often deployed as black-box systems), indicating that related privacy and copyright issues will persist in the foreseeable future.
    Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies. (arXiv:2306.12714v1 [cs.SD])
    Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work well even under the rich diversity of vocal and noisy samples owing to their representation ability. However, the limited availability of labeled data remains a significant obstacle to achieving satisfactory performance. In recent years, self-supervised learning models (SSL models) have been trained using large amounts of unlabeled data in the field of speech processing and music classification. By fine-tuning these models for the target tasks, comparable performance to conventional supervised learning can be achieved with limited training data. Therefore, in this paper, we investigate the effectiveness of SSL models for various singing voice recognition tasks. We report the results of experiments comparing SSL models for three different tasks (i.e., singer identification, singing voice transcription, and singing technique classification) as initial exploration and aim to discuss these findings. Experimental results show that each SSL model achieves comparable performance and sometimes outperforms compared to state-of-the-art methods on each task. We also conducted a layer-wise analysis to further understand the behavior of the SSL models.
    Concept-aware clustering for decentralized deep learning under temporal shift. (arXiv:2306.12768v1 [cs.LG])
    Decentralized deep learning requires dealing with non-iid data across clients, which may also change over time due to temporal shifts. While non-iid data has been extensively studied in distributed settings, temporal shifts have received no attention. To the best of our knowledge, we are first with tackling the novel and challenging problem of decentralized learning with non-iid and dynamic data. We propose a novel algorithm that can automatically discover and adapt to the evolving concepts in the network, without any prior knowledge or estimation of the number of concepts. We evaluate our algorithm on standard benchmark datasets and demonstrate that it outperforms previous methods for decentralized learning.
    Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation. (arXiv:2306.12700v1 [cs.LG])
    We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.
    Instruct-FinGPT: Financial Sentiment Analysis by Instruction Tuning of General-Purpose Large Language Models. (arXiv:2306.12659v1 [cs.CL])
    Sentiment analysis is a vital tool for uncovering insights from financial articles, news, and social media, shaping our understanding of market movements. Despite the impressive capabilities of large language models (LLMs) in financial natural language processing (NLP), they still struggle with accurately interpreting numerical values and grasping financial context, limiting their effectiveness in predicting financial sentiment. In this paper, we introduce a simple yet effective instruction tuning approach to address these issues. By transforming a small portion of supervised financial sentiment analysis data into instruction data and fine-tuning a general-purpose LLM with this method, we achieve remarkable advancements in financial sentiment analysis. In the experiment, our approach outperforms state-of-the-art supervised sentiment analysis models, as well as widely used LLMs like ChatGPT and LLaMAs, particularly in scenarios where numerical understanding and contextual comprehension are vital.
    Hierarchical Neural Simulation-Based Inference Over Event Ensembles. (arXiv:2306.12584v1 [stat.ML])
    When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).
    MP3: Movement Primitive-Based (Re-)Planning Policy. (arXiv:2306.12729v1 [cs.LG])
    We introduce a novel deep reinforcement learning (RL) approach called Movement Prmitive-based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.
    Multi-Objective Hull Form Optimization with CAD Engine-based Deep Learning Physics for 3D Flow Prediction. (arXiv:2306.12915v1 [cs.LG])
    In this work, we propose a built-in Deep Learning Physics Optimization (DLPO) framework to set up a shape optimization study of the Duisburg Test Case (DTC) container vessel. We present two different applications: (1) sensitivity analysis to detect the most promising generic basis hull shapes, and (2) multi-objective optimization to quantify the trade-off between optimal hull forms. DLPO framework allows for the evaluation of design iterations automatically in an end-to-end manner. We achieved these results by coupling Extrality's Deep Learning Physics (DLP) model to a CAD engine and an optimizer. Our proposed DLP model is trained on full 3D volume data coming from RANS simulations, and it can provide accurate and high-quality 3D flow predictions in real-time, which makes it a good evaluator to perform optimization of new container vessel designs w.r.t the hydrodynamic efficiency. In particular, it is able to recover the forces acting on the vessel by integration on the hull surface with a mean relative error of 3.84\% \pm 2.179\% on the total resistance. Each iteration takes only 20 seconds, thus leading to a drastic saving of time and engineering efforts, while delivering valuable insight into the performance of the vessel, including RANS-like detailed flow information. We conclude that DLPO framework is a promising tool to accelerate the ship design process and lead to more efficient ships with better hydrodynamic performance.
    Class-Incremental Learning based on Label Generation. (arXiv:2306.12619v1 [cs.CL])
    Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.
    RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness. (arXiv:2306.12612v1 [cs.LG])
    Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness constraints. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
    Towards quantum enhanced adversarial robustness in machine learning. (arXiv:2306.12688v1 [quant-ph])
    Machine learning algorithms are powerful tools for data driven tasks such as image classification and feature detection, however their vulnerability to adversarial examples - input samples manipulated to fool the algorithm - remains a serious challenge. The integration of machine learning with quantum computing has the potential to yield tools offering not only better accuracy and computational efficiency, but also superior robustness against adversarial attacks. Indeed, recent work has employed quantum mechanical phenomena to defend against adversarial attacks, spurring the rapid development of the field of quantum adversarial machine learning (QAML) and potentially yielding a new source of quantum advantage. Despite promising early results, there remain challenges towards building robust real-world QAML tools. In this review we discuss recent progress in QAML and identify key challenges. We also suggest future research directions which could determine the route to practicality for QAML approaches as quantum computing hardware scales up and noise levels are reduced.
    Finite-time Lyapunov exponents of deep neural networks. (arXiv:2306.12548v1 [cond-mat.dis-nn])
    We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.
    Identifying and Disentangling Spurious Features in Pretrained Image Representations. (arXiv:2306.12673v1 [cs.LG])
    Neural networks employ spurious correlations in their predictions, resulting in decreased performance when these correlations do not hold. Recent works suggest fixing pretrained representations and training a classification head that does not use spurious features. We investigate how spurious features are represented in pretrained representations and explore strategies for removing information about spurious features. Considering the Waterbirds dataset and a few pretrained representations, we find that even with full knowledge of spurious features, their removal is not straightforward due to entangled representation. To address this, we propose a linear autoencoder training method to separate the representation into core, spurious, and other features. We propose two effective spurious feature removal approaches that are applied to the encoding and significantly improve classification performance measured by worst group accuracy.
    Pure Exploration in Bandits with Linear Constraints. (arXiv:2306.12774v1 [cs.LG])
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.
    Vec2Vec: A Compact Neural Network Approach for Transforming Text Embeddings with High Fidelity. (arXiv:2306.12689v1 [cs.CL])
    Vector embeddings have become ubiquitous tools for many language-related tasks. A leading embedding model is OpenAI's text-ada-002 which can embed approximately 6,000 words into a 1,536-dimensional vector. While powerful, text-ada-002 is not open source and is only available via API. We trained a simple neural network to convert open-source 768-dimensional MPNet embeddings into text-ada-002 embeddings. We compiled a subset of 50,000 online food reviews. We calculated MPNet and text-ada-002 embeddings for each review and trained a simple neural network to for 75 epochs. The neural network was designed to predict the corresponding text-ada-002 embedding for a given MPNET embedding. Our model achieved an average cosine similarity of 0.932 on 10,000 unseen reviews in our held-out test dataset. We manually assessed the quality of our predicted embeddings for vector search over text-ada-002-embedded reviews. While not as good as real text-ada-002 embeddings, predicted embeddings were able to retrieve highly relevant reviews. Our final model, Vec2Vec, is lightweight (<80 MB) and fast. Future steps include training a neural network with a more sophisticated architecture and a larger dataset of paired embeddings to achieve greater performance. The ability to convert between and align embedding spaces may be helpful for interoperability, limiting dependence on proprietary models, protecting data privacy, reducing costs, and offline operations.
    Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems. (arXiv:2306.12691v1 [cs.LG])
    The execution of large deep neural networks (DNN) at mobile edge devices requires considerable consumption of critical resources, such as energy, while imposing demands on hardware capabilities. In approaches based on edge computing the execution of the models is offloaded to a compute-capable device positioned at the edge of 5G infrastructures. The main issue of the latter class of approaches is the need to transport information-rich signals over wireless links with limited and time-varying capacity. The recent split computing paradigm attempts to resolve this impasse by distributing the execution of DNN models across the layers of the systems to reduce the amount of data to be transmitted while imposing minimal computing load on mobile devices. In this context, we propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. This is in contrast with existing approaches, where the same adaptation requires costly context switching and model loading. Moreover, our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices. We present a comprehensive comparison with the most advanced split computing solutions, as well as an experimental evaluation on GPU-less devices.
    Memory-Query Tradeoffs for Randomized Convex Optimization. (arXiv:2306.12534v1 [cs.DS])
    We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$ queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$ is quasipolynomially small in $d$. Our result implies that cutting plane methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries, are Pareto-optimal among randomized first-order algorithms, and quadratic memory is required to achieve optimal query complexity for convex optimization.
    Investigating Poor Performance Regions of Black Boxes: LIME-based Exploration in Sepsis Detection. (arXiv:2306.12507v1 [cs.LG])
    Interpreting machine learning models remains a challenge, hindering their adoption in clinical settings. This paper proposes leveraging Local Interpretable Model-Agnostic Explanations (LIME) to provide interpretable descriptions of black box classification models in high-stakes sepsis detection. By analyzing misclassified instances, significant features contributing to suboptimal performance are identified. The analysis reveals regions where the classifier performs poorly, allowing the calculation of error rates within these regions. This knowledge is crucial for cautious decision-making in sepsis detection and other critical applications. The proposed approach is demonstrated using the eICU dataset, effectively identifying and visualizing regions where the classifier underperforms. By enhancing interpretability, our method promotes the adoption of machine learning models in clinical practice, empowering informed decision-making and mitigating risks in critical scenarios.
    On Exploring Node-feature and Graph-structure Diversities for Node Drop Graph Pooling. (arXiv:2306.12726v1 [cs.LG])
    A pooling operation is essential for effective graph-level representation learning, where the node drop pooling has become one mainstream graph pooling technology. However, current node drop pooling methods usually keep the top-k nodes according to their significance scores, which ignore the graph diversity in terms of the node features and the graph structures, thus resulting in suboptimal graph-level representations. To address the aforementioned issue, we propose a novel plug-and-play score scheme and refer to it as MID, which consists of a \textbf{M}ultidimensional score space with two operations, \textit{i.e.}, fl\textbf{I}pscore and \textbf{D}ropscore. Specifically, the multidimensional score space depicts the significance of nodes through multiple criteria; the flipscore encourages the maintenance of dissimilar node features; and the dropscore forces the model to notice diverse graph structures instead of being stuck in significant local structures. To evaluate the effectiveness of our proposed MID, we perform extensive experiments by applying it to a wide variety of recent node drop pooling methods, including TopKPool, SAGPool, GSAPool, and ASAP. Specifically, the proposed MID can efficiently and consistently achieve about 2.8\% average improvements over the above four methods on seventeen real-world graph classification datasets, including four social datasets (IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and COLLAB), and thirteen biochemical datasets (D\&D, PROTEINS, NCI1, MUTAG, PTC-MR, NCI109, ENZYMES, MUTAGENICITY, FRANKENSTEIN, HIV, BBBP, TOXCAST, and TOX21). Code is available at~\url{https://github.com/whuchuang/mid}.
    Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications. (arXiv:2306.12670v1 [stat.ML])
    In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.
    Learning Conditional Instrumental Variable Representation for Causal Effect Estimation. (arXiv:2306.12453v1 [cs.LG])
    One of the fundamental challenges in causal inference is to estimate the causal effect of a treatment on its outcome of interest from observational data. However, causal effect estimation often suffers from the impacts of confounding bias caused by unmeasured confounders that affect both the treatment and the outcome. The instrumental variable (IV) approach is a powerful way to eliminate the confounding bias from latent confounders. However, the existing IV-based estimators require a nominated IV, and for a conditional IV (CIV) the corresponding conditioning set too, for causal effect estimation. This limits the application of IV-based estimators. In this paper, by leveraging the advantage of disentangled representation learning, we propose a novel method, named DVAE.CIV, for learning and disentangling the representations of CIV and the representations of its conditioning set for causal effect estimations from data with latent confounders. Extensive experimental results on both synthetic and real-world datasets demonstrate the superiority of the proposed DVAE.CIV method against the existing causal effect estimators.
    Neural Multigrid Memory For Computational Fluid Dynamics. (arXiv:2306.12545v1 [physics.flu-dyn])
    Turbulent flow simulation plays a crucial role in various applications, including aircraft and ship design, industrial process optimization, and weather prediction. In this paper, we propose an advanced data-driven method for simulating turbulent flow, representing a significant improvement over existing approaches. Our methodology combines the strengths of Video Prediction Transformer (VPTR) (Ye & Bilodeau, 2022) and Multigrid Architecture (MgConv, MgResnet) (Ke et al., 2017). VPTR excels in capturing complex spatiotemporal dependencies and handling large input data, making it a promising choice for turbulent flow prediction. Meanwhile, Multigrid Architecture utilizes multiple grids with different resolutions to capture the multiscale nature of turbulent flows, resulting in more accurate and efficient simulations. Through our experiments, we demonstrate the effectiveness of our proposed approach, named MGxTransformer, in accurately predicting velocity, temperature, and turbulence intensity for incompressible turbulent flows across various geometries and flow conditions. Our results exhibit superior accuracy compared to other baselines, while maintaining computational efficiency.
    Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback. (arXiv:2306.12438v1 [eess.IV])
    Generative models capable of capturing nuanced clinical features in medical images hold great promise for facilitating clinical data sharing, enhancing rare disease datasets, and efficiently synthesizing annotated medical images at scale. Despite their potential, assessing the quality of synthetic medical images remains a challenge. While modern generative models can synthesize visually-realistic medical images, the clinical validity of these images may be called into question. Domain-agnostic scores, such as FID score, precision, and recall, cannot incorporate clinical knowledge and are, therefore, not suitable for assessing clinical sensibility. Additionally, there are numerous unpredictable ways in which generative models may fail to synthesize clinically plausible images, making it challenging to anticipate potential failures and manually design scores for their detection. To address these challenges, this paper introduces a pathologist-in-the-loop framework for generating clinically-plausible synthetic medical images. Starting with a diffusion model pretrained using real images, our framework comprises three steps: (1) evaluating the generated images by expert pathologists to assess whether they satisfy clinical desiderata, (2) training a reward model that predicts the pathologist feedback on new samples, and (3) incorporating expert knowledge into the diffusion model by using the reward model to inform a finetuning objective. We show that human feedback significantly improves the quality of synthetic images in terms of fidelity, diversity, utility in downstream applications, and plausibility as evaluated by experts.
    Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds. (arXiv:2306.12498v1 [math.OC])
    Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement. It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed. For convex finite sum problems with $n$ components and under the $L$-smoothness assumption for each component function, there are matching upper and lower bounds, under sufficiently small -- $\mathcal{O}(\frac{1}{nL})$ -- step sizes. Yet those bounds appear too pessimistic -- in fact, the predicted performance is generally no better than for full gradient descent -- and do not agree with the empirical observations. In this work, to narrow the gap between the theory and practice of shuffled SGD, we sharpen the focus from general finite sum problems to empirical risk minimization with linear predictors. This allows us to take a primal-dual perspective and interpret shuffled SGD as a primal-dual method with cyclic coordinate updates on the dual side. Leveraging this perspective, we prove a fine-grained complexity bound that depends on the data matrix and is never worse than what is predicted by the existing bounds. Notably, our bound can predict much faster convergence than the existing analyses -- by a factor of the order of $\sqrt{n}$ in some cases. We empirically demonstrate that on common machine learning datasets our bound is indeed much tighter. We further show how to extend our analysis to convex nonsmooth problems, with similar improvements.
    Modeling T1 Resting-State MRI Variants Using Convolutional Neural Networks in Diagnosis of OCD. (arXiv:2306.12435v1 [q-bio.NC])
    Obsessive-compulsive disorder (OCD) presents itself as a highly debilitating disorder. The disorder has common associations with the prefrontal cortex and the glutamate receptor known as Metabotropic Glutamate Receptor 5 (mGluR5). This receptor has been observed to demonstrate higher levels of signaling from positron emission tomography scans measured by its distribution volume ratios in mice. Despite this evidence, studies are unable to fully verify the involvement of mGluR5 as more empirical data is needed. Computational modeling methods were used as a means of validation for previous hypotheses involving mGluR5. The inadequacies in relation to the causal factor of OCD were answered by utilizing T1 resting-state magnetic resonance imaging (TRS-MRI) scans of patients suffering from schizophrenia, major depressive disorder, and obsessive-compulsive disorder. Because comorbid cases often occur within these disorders, cross-comparative abilities become necessary to find distinctive characteristics. Two-dimensional convolutional neural networks alongside ResNet50 and MobileNet models were constructed and evaluated for efficiency. Activation heatmaps of TRS-MRI scans were outputted, allowing for transcriptomics analysis. Though, a lack of ability to predict OCD cases prevented gene expression analysis. Across all models, there was an 88.75\% validation accuracy for MDD, and 82.08\% validation accuracy for SZD under the framework of ResNet50 as well as novel computation. OCD yielded an accuracy rate of ~54.4\%. These results provided further evidence for the p factor theorem regarding mental disorders. Future work involves the application of transfer learning to bolster accuracy rates.
    Constant Memory Attention Block. (arXiv:2306.12599v1 [cs.LG])
    Modern foundation model architectures rely on attention mechanisms to effectively capture context. However, these methods require linear or quadratic memory in terms of the number of inputs/datapoints, limiting their applicability in low-compute domains. In this work, we propose Constant Memory Attention Block (CMAB), a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. Highlighting CMABs efficacy, we introduce methods for Neural Processes and Temporal Point Processes. Empirically, we show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
    Communication-Efficient Federated Learning through Importance Sampling. (arXiv:2306.12625v1 [cs.LG])
    The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.
    Outlier-robust Estimation of a Sparse Linear Model Using Invexity. (arXiv:2306.12678v1 [cs.LG])
    In this paper, we study problem of estimating a sparse regression vector with correct support in the presence of outlier samples. The inconsistency of lasso-type methods is well known in this scenario. We propose a combinatorial version of outlier-robust lasso which also identifies clean samples. Subsequently, we use these clean samples to make a good estimation. We also provide a novel invex relaxation for the combinatorial problem and provide provable theoretical guarantees for this relaxation. Finally, we conduct experiments to validate our theory and compare our results against standard lasso.
    Balanced Training of Energy-Based Models with Adaptive Flow Sampling. (arXiv:2306.00684v2 [cs.LG] UPDATED)
    Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data.
    MPSTAN: Metapopulation-based Spatio-Temporal Attention Network for Epidemic Forecasting. (arXiv:2306.12436v1 [cs.LG])
    Accurate epidemic forecasting plays a vital role for governments in developing effective prevention measures for suppressing epidemics. Most of the present spatio-temporal models cannot provide a general framework for stable, and accurate forecasting of epidemics with diverse evolution trends. Incorporating epidemiological domain knowledge ranging from single-patch to multi-patch into neural networks is expected to improve forecasting accuracy. However, relying solely on single-patch knowledge neglects inter-patch interactions, while constructing multi-patch knowledge is challenging without population mobility data. To address the aforementioned problems, we propose a novel hybrid model called Metapopulation-based Spatio-Temporal Attention Network (MPSTAN). This model aims to improve the accuracy of epidemic forecasting by incorporating multi-patch epidemiological knowledge into a spatio-temporal model and adaptively defining inter-patch interactions. Moreover, we incorporate inter-patch epidemiological knowledge into both the model construction and loss function to help the model learn epidemic transmission dynamics. Extensive experiments conducted on two representative datasets with different epidemiological evolution trends demonstrate that our proposed model outperforms the baselines and provides more accurate and stable short- and long-term forecasting. We confirm the effectiveness of domain knowledge in the learning model and investigate the impact of different ways of integrating domain knowledge on forecasting. We observe that using domain knowledge in both model construction and loss functions leads to more efficient forecasting, and selecting appropriate domain knowledge can improve accuracy further.
    Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study. (arXiv:2306.12582v1 [stat.ML])
    In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.
    Comparing deep learning models for volatility prediction using multivariate data. (arXiv:2306.12446v1 [q-fin.ST])
    This study aims at comparing several deep learning-based forecasters in the task of volatility prediction using multivariate data, proceeding from simpler or shallower to deeper and more complex models and compare them to the naive prediction and variations of classical GARCH models. Specifically, the volatility of five assets (i.e., S\&P500, NASDAQ100, gold, silver, and oil) was predicted with the GARCH models, Multi-Layer Perceptrons, recurrent neural networks, Temporal Convolutional Networks, and the Temporal Fusion Transformer. In most cases the Temporal Fusion Transformer followed by variants of Temporal Convolutional Network outperformed classical approaches and shallow networks. These experiments were repeated, and the difference between competing models was shown to be statistically significant, therefore encouraging their use in practice.
    Verifying Global Neural Network Specifications using Hyperproperties. (arXiv:2306.12495v1 [cs.LG])
    Current approaches to neural network verification focus on specifications that target small regions around known input data points, such as local robustness. Thus, using these approaches, we can not obtain guarantees for inputs that are not close to known inputs. Yet, it is highly likely that a neural network will encounter such truly unseen inputs during its application. We study global specifications that - when satisfied - provide guarantees for all potential inputs. We introduce a hyperproperty formalism that allows for expressing global specifications such as monotonicity, Lipschitz continuity, global robustness, and dependency fairness. Our formalism enables verifying global specifications using existing neural network verification approaches by leveraging capabilities for verifying general computational graphs. Thereby, we extend the scope of guarantees that can be provided using existing methods. Recent success in verifying specific global specifications shows that attaining strong guarantees for all potential data points is feasible.
    Semi-Implicit Denoising Diffusion Models (SIDDMs). (arXiv:2306.12511v1 [cs.LG])
    Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
    HypeRS: Building a Hypergraph-driven ensemble Recommender System. (arXiv:2306.12800v1 [cs.IR])
    Recommender systems are designed to predict user preferences over collections of items. These systems process users' previous interactions to decide which items should be ranked higher to satisfy their desires. An ensemble recommender system can achieve great recommendation performance by effectively combining the decisions generated by individual models. In this paper, we propose a novel ensemble recommender system that combines predictions made by different models into a unified hypergraph ranking framework. This is the first time that hypergraph ranking has been employed to model an ensemble of recommender systems. Hypergraphs are generalizations of graphs where multiple vertices can be connected via hyperedges, efficiently modeling high-order relations. We differentiate real and predicted connections between users and items by assigning different hyperedge weights to individual recommender systems. We perform experiments using four datasets from the fields of movie, music and news media recommendation. The obtained results show that the ensemble hypergraph ranking method generates more accurate recommendations compared to the individual models and a weighted hybrid approach. The assignment of different hyperedge weights to the ensemble hypergraph further improves the performance compared to a setting with identical hyperedge weights.
    XAI-TRIS: Non-linear benchmarks to quantify ML explanation performance. (arXiv:2306.12816v1 [cs.LG])
    The field of 'explainable' artificial intelligence (XAI) has produced highly cited methods that seek to make the decisions of complex machine learning (ML) methods 'understandable' to humans, for example by attributing 'importance' scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods. This means that challenging non-linear problems, typically solved by deep neural networks, presently lack appropriate remedies. Here, we craft benchmark datasets for three different non-linear classification scenarios, in which the important class-conditional features are known by design, serving as ground truth explanations. Using novel quantitative metrics, we benchmark the explanation performance of a wide set of XAI methods across three deep learning model architectures. We show that popular XAI methods are often unable to significantly outperform random performance baselines and edge detection methods. Moreover, we demonstrate that explanations derived from different model architectures can be vastly different; thus, prone to misinterpretation even under controlled conditions.
    On the Robustness of Generative Retrieval Models: An Out-of-Distribution Perspective. (arXiv:2306.12756v1 [cs.IR])
    Recently, we have witnessed generative retrieval increasingly gaining attention in the information retrieval (IR) field, which retrieves documents by directly generating their identifiers. So far, much effort has been devoted to developing effective generative retrieval models. There has been less attention paid to the robustness perspective. When a new retrieval paradigm enters into the real-world application, it is also critical to measure the out-of-distribution (OOD) generalization, i.e., how would generative retrieval models generalize to new distributions. To answer this question, firstly, we define OOD robustness from three perspectives in retrieval problems: 1) The query variations; 2) The unforeseen query types; and 3) The unforeseen tasks. Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of several representative generative retrieval models against dense retrieval models. The empirical results indicate that the OOD robustness of generative retrieval models requires enhancement. We hope studying the OOD robustness of generative retrieval models would be advantageous to the IR community.
    Density Uncertainty Layers for Reliable Uncertainty Estimation. (arXiv:2306.12497v1 [cs.LG])
    Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the approaches that are commonly used to approximate the posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper we propose a novel criterion for predictive uncertainty, that a model's predictive variance should be grounded in the empirical density of the input. It should produce higher uncertainty for inputs that are improbable in the training data and lower uncertainty for those inputs that are more probable. To operationalize this criterion, we develop the density uncertainty layer, an architectural element for a stochastic neural network that guarantees that the density uncertain criterion is satisfied. We study neural networks with density uncertainty layers on the CIFAR-10 and CIFAR-100 uncertainty benchmarks. Compared to existing approaches, we find that density uncertainty layers provide reliable uncertainty estimates and robust out-of-distribution detection performance.
    Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials. (arXiv:2306.12444v1 [eess.AS])
    Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality of data influence ASV performance. However, there has been limited investigation into the factors that can affect ASV capabilities in clinical environments. In this paper, we bridge the gap by conducting analysis of how participant demographic characteristics, audio quality criteria, and severity level of Alzheimer's disease (AD) impact the performance of ASV utilizing a dataset of speech recordings from 659 participants with varying levels of AD, obtained through multiple speech tasks. Our results indicate that ASV performance: 1) is slightly better on male speakers than on female speakers; 2) degrades for individuals who are above 70 years old; 3) is comparatively better for non-native English speakers than for native English speakers; 4) is negatively affected by clinician interference, noisy background, and unclear participant speech; 5) tends to decrease with an increase in the severity level of AD. Our study finds that voice biometrics raise fairness concerns as certain subgroups exhibit different ASV performances owing to their inherent voice characteristics. Moreover, the performance of ASV is influenced by the quality of speech recordings, which underscores the importance of improving the data collection settings in clinical trials.
    Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement. (arXiv:2306.12803v1 [stat.ML])
    Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.
    Knowledge Distillation via Token-level Relationship Graph. (arXiv:2306.12442v1 [cs.LG])
    Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily focus on distilling individual information or instance-level relationships, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages the token-wise relational knowledge to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved distillation results. To further enhance the learning process, we introduce a token-wise contextual loss called contextual loss, which encourages the student model to capture the inner-instance semantic contextual of the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual classification tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of knowledge distillation.
    Deep Dynamic Epidemiological Modelling for COVID-19 Forecasting in Multi-level Districts. (arXiv:2306.12457v1 [cs.LG])
    Objective: COVID-19 has spread worldwide and made a huge influence across the world. Modeling the infectious spread situation of COVID-19 is essential to understand the current condition and to formulate intervention measurements. Epidemiological equations based on the SEIR model simulate disease development. The traditional parameter estimation method to solve SEIR equations could not precisely fit real-world data due to different situations, such as social distancing policies and intervention strategies. Additionally, learning-based models achieve outstanding fitting performance, but cannot visualize mechanisms. Methods: Thus, we propose a deep dynamic epidemiological (DDE) method that combines epidemiological equations and deep-learning advantages to obtain high accuracy and visualization. The DDE contains deep networks to fit the effect function to simulate the ever-changing situations based on the neural ODE method in solving variants' equations, ensuring the fitting performance of multi-level areas. Results: We introduce four SEIR variants to fit different situations in different countries and regions. We compare our DDE method with traditional parameter estimation methods (Nelder-Mead, BFGS, Powell, Truncated Newton Conjugate-Gradient, Neural ODE) in fitting the real-world data in the cases of countries (the USA, Columbia, South Africa) and regions (Wuhan in China, Piedmont in Italy). Our DDE method achieves the best Mean Square Error and Pearson coefficient in all five areas. Further, compared with the state-of-art learning-based approaches, the DDE outperforms all techniques, including LSTM, RNN, GRU, Random Forest, Extremely Random Trees, and Decision Tree. Conclusion: DDE presents outstanding predictive ability and visualized display of the changes in infection rates in different regions and countries.
    On Addressing the Limitations of Graph Neural Networks. (arXiv:2306.12640v1 [cs.LG])
    This report gives a summary of two problems about graph convolutional networks (GCNs): over-smoothing and heterophily challenges, and outlines future directions to explore.
    Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images. (arXiv:2306.12510v1 [eess.IV])
    In this study, the main objective is to develop an algorithm capable of identifying and delineating tumor regions in breast ultrasound (BUS) and mammographic images. The technique employs two advanced deep learning architectures, namely U-Net and pretrained SAM, for tumor segmentation. The U-Net model is specifically designed for medical image segmentation and leverages its deep convolutional neural network framework to extract meaningful features from input images. On the other hand, the pretrained SAM architecture incorporates a mechanism to capture spatial dependencies and generate segmentation results. Evaluation is conducted on a diverse dataset containing annotated tumor regions in BUS and mammographic images, covering both benign and malignant tumors. This dataset enables a comprehensive assessment of the algorithm's performance across different tumor types. Results demonstrate that the U-Net model outperforms the pretrained SAM architecture in accurately identifying and segmenting tumor regions in both BUS and mammographic images. The U-Net exhibits superior performance in challenging cases involving irregular shapes, indistinct boundaries, and high tumor heterogeneity. In contrast, the pretrained SAM architecture exhibits limitations in accurately identifying tumor areas, particularly for malignant tumors and objects with weak boundaries or complex shapes. These findings highlight the importance of selecting appropriate deep learning architectures tailored for medical image segmentation. The U-Net model showcases its potential as a robust and accurate tool for tumor detection, while the pretrained SAM architecture suggests the need for further improvements to enhance segmentation performance.
    State-wise Constrained Policy Optimization. (arXiv:2306.12594v1 [cs.LG])
    Reinforcement Learning (RL) algorithms have shown tremendous success in simulation environments, but their application to real-world problems faces significant challenges, with safety being a major concern. In particular, enforcing state-wise constraints is essential for many challenging tasks such as autonomous driving and robot manipulation. However, existing safe RL algorithms under the framework of Constrained Markov Decision Process (CMDP) do not consider state-wise constraints. To address this gap, we propose State-wise Constrained Policy Optimization (SCPO), the first general-purpose policy search algorithm for state-wise constrained reinforcement learning. SCPO provides guarantees for state-wise constraint satisfaction in expectation. In particular, we introduce the framework of Maximum Markov Decision Process, and prove that the worst-case safety violation is bounded under SCPO. We demonstrate the effectiveness of our approach on training neural network policies for extensive robot locomotion tasks, where the agent must satisfy a variety of state-wise safety constraints. Our results show that SCPO significantly outperforms existing methods and can handle state-wise constraints in high-dimensional robotics tasks.
    Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models. (arXiv:2306.12747v1 [math.OC])
    Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.
    Can Differentiable Decision Trees Learn Interpretable Reward Functions?. (arXiv:2306.13004v1 [cs.LG])
    There is an increasing interest in learning reward functions that model human intent and human preferences. However, many frameworks use blackbox learning methods that, while expressive, are difficult to interpret. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs) for both low- and high-dimensional state inputs. We explore and discuss the viability of learning interpretable reward functions using DDTs by evaluating our algorithm on Cartpole, Visual Gridworld environments, and Atari games. We provide evidence that that the tree structure of our learned reward function is useful in determining the extent to which a reward function is aligned with human preferences. We visualize the learned reward DDTs and find that they are capable of learning interpretable reward functions but that the discrete nature of the trees hurts the performance of reinforcement learning at test time. However, we also show evidence that using soft outputs (averaged over all leaf nodes) results in competitive performance when compared with larger capacity deep neural network reward functions.
    CosmoPower-JAX: high-dimensional Bayesian inference with differentiable cosmological emulators. (arXiv:2305.06347v2 [astro-ph.CO] UPDATED)
    We present CosmoPower-JAX, a JAX-based implementation of the CosmoPower framework, which accelerates cosmological inference by building neural emulators of cosmological power spectra. We show how, using the automatic differentiation, batch evaluation and just-in-time compilation features of JAX, and running the inference pipeline on graphics processing units (GPUs), parameter estimation can be accelerated by orders of magnitude with advanced gradient-based sampling techniques. These can be used to efficiently explore high-dimensional parameter spaces, such as those needed for the analysis of next-generation cosmological surveys. We showcase the accuracy and computational efficiency of CosmoPower-JAX on two simulated Stage IV configurations. We first consider a single survey performing a cosmic shear analysis totalling 37 model parameters. We validate the contours derived with CosmoPower-JAX and a Hamiltonian Monte Carlo sampler against those derived with a nested sampler and without emulators, obtaining a speed-up factor of $\mathcal{O}(10^3)$. We then consider a combination of three Stage IV surveys, each performing a joint cosmic shear and galaxy clustering (3x2pt) analysis, for a total of 157 model parameters. Even with such a high-dimensional parameter space, CosmoPower-JAX provides converged posterior contours in 3 days, as opposed to the estimated 6 years required by standard methods. CosmoPower-JAX is fully written in Python, and we make it publicly available to help the cosmological community meet the accuracy requirements set by next-generation surveys.
    Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning. (arXiv:2211.15359v2 [cs.CL] UPDATED)
    The next step for intelligent dialog agents is to escape their role as silent bystanders and become proactive. Well-defined proactive behavior may improve human-machine cooperation, as the agent takes a more active role during interaction and takes off responsibility from the user. However, proactivity is a double-edged sword because poorly executed pre-emptive actions may have a devastating effect not only on the task outcome but also on the relationship with the user. For designing adequate proactive dialog strategies, we propose a novel approach including both social as well as task-relevant features in the dialog. Here, the primary goal is to optimize proactive behavior so that it is task-oriented - this implies high task success and efficiency - while also being socially effective by fostering user trust. Including both aspects in the reward function for training a proactive dialog agent using reinforcement learning showed the benefit of our approach for more successful human-machine cooperation.
    Explore, Establish, Exploit: Red Teaming Language Models from Scratch. (arXiv:2306.09442v2 [cs.CL] UPDATED)
    Deploying Large language models (LLMs) can pose hazards from harmful outputs such as toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order to identify and mitigate these risks. While this is a valuable step toward securing language models, these approaches typically rely on a pre-existing classifier for undesired outputs. This limits their application to situations where the type of harmful behavior is known with precision beforehand. However, this skips a central challenge of red teaming: developing a contextual understanding of the behaviors that a model can exhibit. Furthermore, when such a classifier already exists, red teaming has limited marginal value because the classifier could simply be used to filter training data or model outputs. In this work, we consider red teaming under the assumption that the adversary is working from a high-level, abstract specification of undesired behavior. The red team is expected to refine/extend this specification and identify methods to elicit this behavior from the model. Our red teaming framework consists of three steps: 1) Exploring the model's behavior in the desired context; 2) Establishing a measurement of undesired behavior (e.g., a classifier trained to reflect human evaluations); and 3) Exploiting the model's flaws using this measure and an established red teaming methodology. We apply this approach to red team GPT-2 and GPT-3 models to systematically discover classes of prompts that elicit toxic and dishonest statements. In doing so, we also construct and release the CommonClaim dataset of 20,000 statements that have been labeled by human subjects as common-knowledge-true, common-knowledge-false, or neither. Code is available at https://github.com/thestephencasper/explore_establish_exploit_llms. CommonClaim is available at https://github.com/Algorithmic-Alignment-Lab/CommonClaim.
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v2 [cs.LG] UPDATED)
    Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation. As an essential component of the architecture, graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these works. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods for graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods with a mathematical summary for each category; 2) then, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) next, we further outline the applications that incorporate the idea of graph pooling in a variety of domains; 4) finally, we discuss certain critical challenges facing current studies and share our insights on future potential directions for research on the improvement of graph pooling.
    On-line reinforcement learning for optimization of real-life energy trading strategy. (arXiv:2303.16266v2 [cs.LG] UPDATED)
    An increasing share of energy is produced from renewable sources by many small producers. The efficiency of those sources is volatile and, to some extent, random, exacerbating the problem of energy market balancing. In many countries, this balancing is done on the day-ahead (DA) energy markets. This paper considers automated trading on the DA energy market by a medium size prosumer. We model this activity as a Markov Decision Process and formalize a framework in which an applicable in real-life strategy can be optimized with off-line data. We design a trading strategy that is fed with the available environmental information that can impact future prices, including weather forecasts. We use state-of-the-art reinforcement learning (RL) algorithms to optimize this strategy. For comparison, we also synthesize a simple parametric trading strategy and optimize it with an evolutionary algorithm. Results show that our RL-based strategy generates the highest market profits.
    Online Self-Supervised Learning in Machine Learning Intrusion Detection for the Internet of Things. (arXiv:2306.13030v1 [cs.CR])
    This paper proposes a novel Self-Supervised Intrusion Detection (SSID) framework, which enables a fully online Machine Learning (ML) based Intrusion Detection System (IDS) that requires no human intervention or prior off-line learning. The proposed framework analyzes and labels incoming traffic packets based only on the decisions of the IDS itself using an Auto-Associative Deep Random Neural Network, and on an online estimate of its statistically measured trustworthiness. The SSID framework enables IDS to adapt rapidly to time-varying characteristics of the network traffic, and eliminates the need for offline data collection. This approach avoids human errors in data labeling, and human labor and computational costs of model training and data collection. The approach is experimentally evaluated on public datasets and compared with well-known ML models, showing that this SSID framework is very useful and advantageous as an accurate and online learning ML-based IDS for IoT systems.
    Learning topological defects formation with neural networks in a quantum phase transition. (arXiv:2204.06769v2 [cond-mat.dis-nn] CROSS LISTED)
    Neural networks possess formidable representational power, rendering them invaluable in solving complex quantum many-body systems. While they excel at analyzing static solutions, nonequilibrium processes, including critical dynamics during a quantum phase transition, pose a greater challenge for neural networks. To address this, we utilize neural networks and machine learning algorithms to investigate the time evolutions, universal statistics, and correlations of topological defects in a one-dimensional transverse-field quantum Ising model. Specifically, our analysis involves computing the energy of the system during a quantum phase transition following a linear quench of the transverse magnetic field strength. The excitation energies satisfy a power-law relation to the quench rate, indicating a proportional relationship between the excitation energy and the kink numbers. Moreover, we establish a universal power-law relationship between the first three cumulants of the kink numbers and the quench rate, indicating a binomial distribution of the kinks. Finally, the normalized kink-kink correlations are also investigated and it is found that the numerical values are consistent with the analytic formula.
    Finding neural signatures for obesity through feature selection on source-localized EEG. (arXiv:2208.14007v3 [cs.LG] UPDATED)
    Obesity is a serious issue in the modern society and is often associated to significantly reduced quality of life. Current research conducted to explore obesity-related neurological evidences using electroencephalography (EEG) data are limited to traditional approaches. In this study, we developed a novel machine learning model to identify brain networks of obese females using alpha band functional connectivity features derived from EEG data. An overall classification accuracy of 0.937 is achieved. Our finding suggests that the obese brain is characterized by a dysfunctional network in which the areas that responsible for processing self-referential information and environmental context information are impaired.
    FDINet: Protecting against DNN Model Extraction via Feature Distortion Index. (arXiv:2306.11338v2 [cs.CR] UPDATED)
    Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.
    Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories. (arXiv:2210.06518v3 [cs.LG] UPDATED)
    Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful -- on several D4RL benchmarks~\cite{fu2020d4rl}, certain offline RL algorithms can match the performance of variants trained on a fully labelled dataset even when we label only 10\% of trajectories which are highly suboptimal. To strengthen our understanding, we perform a large-scale controlled empirical study investigating the interplay of data-centric properties of the labelled and unlabelled datasets, with algorithmic design choices (e.g., choice of inverse dynamics, offline RL algorithm) to identify general trends and best practices for training RL agents on semi-supervised offline datasets.
    CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations. (arXiv:2109.07557v3 [cs.LG] UPDATED)
    This work presents CounterNet, a novel end-to-end learning framework which integrates Machine Learning (ML) model training and the generation of corresponding counterfactual (CF) explanations into a single end-to-end pipeline. Counterfactual explanations offer a contrastive case, i.e., they attempt to find the smallest modification to the feature values of an instance that changes the prediction of the ML model on that instance to a predefined output. Prior techniques for generating CF explanations suffer from two major limitations: (i) all of them are post-hoc methods designed for use with proprietary ML models -- as a result, their procedure for generating CF explanations is uninformed by the training of the ML model, which leads to misalignment between model predictions and explanations; and (ii) most of them rely on solving separate time-intensive optimization problems to find CF explanations for each input data point (which negatively impacts their runtime). This work makes a novel departure from the prevalent post-hoc paradigm (of generating CF explanations) by presenting CounterNet, an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations into a single pipeline. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. We adopt a block-wise coordinate descent procedure which helps in effectively training CounterNet's network. Our extensive experiments on multiple real-world datasets show that CounterNet generates high-quality predictions, and consistently achieves 100% CF validity and low proximity scores (thereby achieving a well-balanced cost-invalidity trade-off) for any new input instance, and runs 3X faster than existing state-of-the-art baselines.
    Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering. (arXiv:2306.12859v1 [cs.LG])
    Federated learning is a distributed machine learning technology, which realizes the balance between data privacy protection and data sharing computing. To protect data privacy, feder-ated learning learns shared models by locally executing distributed training on participating devices and aggregating local models into global models. There is a problem in federated learning, that is, the negative impact caused by the non-independent and identical distribu-tion of data across different user terminals. In order to alleviate this problem, this paper pro-poses a strengthened federation aggregation method based on adaptive OPTICS clustering. Specifically, this method perceives the clustering environment as a Markov decision process, and models the adjustment process of parameter search direction, so as to find the best clus-tering parameters to achieve the best federated aggregation method. The core contribution of this paper is to propose an adaptive OPTICS clustering algorithm for federated learning. The algorithm combines OPTICS clustering and adaptive learning technology, and can effective-ly deal with the problem of non-independent and identically distributed data across different user terminals. By perceiving the clustering environment as a Markov decision process, the goal is to find the best parameters of the OPTICS cluster without artificial assistance, so as to obtain the best federated aggregation method and achieve better performance. The reliability and practicability of this method have been verified on the experimental data, and its effec-tiveness and superiority have been proved.
    Stock Price Prediction using Dynamic Neural Networks. (arXiv:2306.12969v1 [q-fin.ST])
    This paper will analyze and implement a time series dynamic neural network to predict daily closing stock prices. Neural networks possess unsurpassed abilities in identifying underlying patterns in chaotic, non-linear, and seemingly random data, thus providing a mechanism to predict stock price movements much more precisely than many current techniques. Contemporary methods for stock analysis, including fundamental, technical, and regression techniques, are conversed and paralleled with the performance of neural networks. Also, the Efficient Market Hypothesis (EMH) is presented and contrasted with Chaos theory using neural networks. This paper will refute the EMH and support Chaos theory. Finally, recommendations for using neural networks in stock price prediction will be presented.
    TSMixer: An all-MLP Architecture for Time Series Forecasting. (arXiv:2303.06053v3 [cs.LG] UPDATED)
    Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along both the time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. We present various analyses to shed light into the capabilities of TSMixer. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting. The implementation is available at https://github.com/google-research/google-research/tree/master/tsmixer
    FFCV: Accelerating Training by Removing Data Bottlenecks. (arXiv:2306.12517v1 [cs.LG])
    We present FFCV, a library for easy and fast machine learning model training. FFCV speeds up model training by eliminating (often subtle) data bottlenecks from the training process. In particular, we combine techniques such as an efficient file storage format, caching, data pre-loading, asynchronous data transfer, and just-in-time compilation to (a) make data loading and transfer significantly more efficient, ensuring that GPUs can reach full utilization; and (b) offload as much data processing as possible to the CPU asynchronously, freeing GPU cycles for training. Using FFCV, we train ResNet-18 and ResNet-50 on the ImageNet dataset with competitive tradeoff between accuracy and training time. For example, we are able to train an ImageNet ResNet-50 model to 75\% in only 20 mins on a single machine. We demonstrate FFCV's performance, ease-of-use, extensibility, and ability to adapt to resource constraints through several case studies. Detailed installation instructions, documentation, and Slack support channel are available at https://ffcv.io/ .
    Improving Long-Horizon Imitation Through Instruction Prediction. (arXiv:2306.12554v1 [cs.LG])
    Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.
    Scrutinizing XAI using linear ground-truth data with suppressor variables. (arXiv:2111.07473v2 [stat.ML] UPDATED)
    Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting.
    OptIForest: Optimal Isolation Forest for Anomaly Detection. (arXiv:2306.12703v1 [cs.LG])
    Anomaly detection plays an increasingly important role in various fields for critical tasks such as intrusion detection in cybersecurity, financial risk detection, and human health monitoring. A variety of anomaly detection methods have been proposed, and a category based on the isolation forest mechanism stands out due to its simplicity, effectiveness, and efficiency, e.g., iForest is often employed as a state-of-the-art detector for real deployment. While the majority of isolation forests use the binary structure, a framework LSHiForest has demonstrated that the multi-fork isolation tree structure can lead to better detection performance. However, there is no theoretical work answering the fundamentally and practically important question on the optimal tree structure for an isolation forest with respect to the branching factor. In this paper, we establish a theory on isolation efficiency to answer the question and determine the optimal branching factor for an isolation tree. Based on the theoretical underpinning, we design a practical optimal isolation forest OptIForest incorporating clustering based learning to hash which enables more information to be learned from data for better isolation quality. The rationale of our approach relies on a better bias-variance trade-off achieved by bias reduction in OptIForest. Extensive experiments on a series of benchmarking datasets for comparative and ablation studies demonstrate that our approach can efficiently and robustly achieve better detection performance in general than the state-of-the-arts including the deep learning based methods.
    Conditional Generators for Limit Order Book Environments: Explainability, Challenges, and Robustness. (arXiv:2306.12806v1 [q-fin.TR])
    Limit order books are a fundamental and widespread market mechanism. This paper investigates the use of conditional generative models for order book simulation. For developing a trading agent, this approach has drawn recent attention as an alternative to traditional backtesting due to its ability to react to the presence of the trading agent. Using a state-of-the-art CGAN (from Coletta et al. (2022)), we explore its dependence upon input features, which highlights both strengths and weaknesses. To do this, we use "adversarial attacks" on the model's features and its mechanism. We then show how these insights can be used to improve the CGAN, both in terms of its realism and robustness. We finish by laying out a roadmap for future work.
    Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference. (arXiv:2306.12509v1 [cs.CL])
    We view large language models (LLMs) as stochastic \emph{language layers} in a network, where the learnable parameters are the natural language \emph{prompts} at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a \emph{Deep Language Network} (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). We then show how to train 2-layer DLNs (DLN-2), where two prompts must be learnt. We consider the output of the first layer as a latent variable to marginalize, and devise a variational inference algorithm for joint prompt training. A DLN-2 reaches higher performance than a single layer, sometimes comparable to few-shot GPT-4 even when each LLM in the network is smaller and less powerful. The DLN code is open source: https://github.com/microsoft/deep-language-networks .
    A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design. (arXiv:2306.11768v2 [q-bio.QM] UPDATED)
    Structure-based drug design (SBDD), which utilizes the three-dimensional geometry of proteins to identify potential drug candidates, is becoming increasingly vital in drug discovery. However, traditional methods based on physiochemical modeling and experts' domain knowledge are time-consuming and laborious. The recent advancements in geometric deep learning, which integrates and processes 3D geometric data, coupled with the availability of accurate protein 3D structure predictions from tools like AlphaFold, have significantly propelled progress in structure-based drug design. In this paper, we systematically review the recent progress of geometric deep learning for structure-based drug design. We start with a brief discussion of the mainstream tasks in structure-based drug design, commonly used 3D protein representations and representative predictive/generative models. Then we delve into detailed reviews for each task (binding site prediction, binding pose generation, \emph{de novo} molecule generation, linker design, and binding affinity prediction), including the problem setup, representative methods, datasets, and evaluation metrics. Finally, we conclude this survey with the current challenges and highlight potential opportunities of geometric deep learning for structure-based drug design.
    Targeted collapse regularized autoencoder for anomaly detection: black hole at the center. (arXiv:2306.12627v1 [cs.LG])
    Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.
  • Open

    Sharp analysis of EM for learning mixtures of pairwise differences. (arXiv:2302.10066v2 [math.ST] UPDATED)
    We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
    Complex Preferences for Different Convergent Priors in Discrete Graph Diffusion. (arXiv:2306.02957v2 [cs.LG] UPDATED)
    Diffusion models have achieved state-of-the-art performance in generating many different kinds of data, including images, text, and videos. Despite their success, there has been limited research on how the underlying diffusion process and the final convergent prior can affect generative performance; this research has also been limited to continuous data types and a score-based diffusion framework. To fill this gap, we explore how different discrete diffusion kernels (which converge to different prior distributions) affect the performance of diffusion models for graphs. To this end, we developed a novel formulation of a family of discrete diffusion kernels which are easily adjustable to converge to different Bernoulli priors, and we study the effect of these different kernels on generative performance. We show that the quality of generated graphs is sensitive to the prior used, and that the optimal choice cannot be explained by obvious statistics or metrics, which challenges the intuitions which previous works have suggested.
    Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms. (arXiv:2306.12383v2 [cs.LG] UPDATED)
    In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called energy allocation, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.
    Hierarchical Neural Simulation-Based Inference Over Event Ensembles. (arXiv:2306.12584v1 [stat.ML])
    When analyzing real-world data it is common to work with event ensembles, which comprise sets of observations that collectively constrain the parameters of an underlying model of interest. Such models often have a hierarchical structure, where "local" parameters impact individual events and "global" parameters influence the entire dataset. We introduce practical approaches for optimal dataset-wide probabilistic inference in cases where the likelihood is intractable, but simulations can be realized via forward modeling. We construct neural estimators for the likelihood(-ratio) or posterior and show that explicitly accounting for the model's hierarchical structure can lead to tighter parameter constraints. We ground our discussion using case studies from the physical sciences, focusing on examples from particle physics (particle collider data) and astrophysics (strong gravitational lensing observations).
    Fitted Value Iteration Methods for Bicausal Optimal Transport. (arXiv:2306.12658v1 [stat.ML])
    We develop a fitted value iteration (FVI) method to compute bicausal optimal transport (OT) where couplings have an adapted structure. Based on the dynamic programming formulation, FVI adopts a function class to approximate the value functions in bicausal OT. Under the concentrability condition and approximate completeness assumption, we prove the sample complexity using (local) Rademacher complexity. Furthermore, we demonstrate that multilayer neural networks with appropriate structures satisfy the crucial assumptions required in sample complexity proofs. Numerical experiments reveal that FVI outperforms linear programming and adapted Sinkhorn methods in scalability as the time horizon increases, while still maintaining acceptable accuracy.
    On the use of the Gram matrix for multivariate functional principal components analysis. (arXiv:2306.12949v1 [stat.ME])
    Dimension reduction is crucial in functional data analysis (FDA). The key tool to reduce the dimension of the data is functional principal component analysis. Existing approaches for functional principal component analysis usually involve the diagonalization of the covariance operator. With the increasing size and complexity of functional datasets, estimating the covariance operator has become more challenging. Therefore, there is a growing need for efficient methodologies to estimate the eigencomponents. Using the duality of the space of observations and the space of functional features, we propose to use the inner-product between the curves to estimate the eigenelements of multivariate and multidimensional functional datasets. The relationship between the eigenelements of the covariance operator and those of the inner-product matrix is established. We explore the application of these methodologies in several FDA settings and provide general guidance on their usability.
    Cryptocurrency Valuation: An Explainable AI Approach. (arXiv:2201.12893v7 [econ.GN] UPDATED)
    Currently, there are no convincing proxies for the fundamentals of cryptocurrency assets. We propose a new market-to-fundamental ratio, the price-to-utility (PU) ratio, utilizing unique blockchain accounting methods. We then proxy various existing fundamental-to-market ratios by Bitcoin historical data and find they have little predictive power for short-term bitcoin returns. However, PU ratio effectively predicts long-term bitcoin returns than alternative methods. Furthermore, we verify the explainability of PU ratio using machine learning. Finally, we present an automated trading strategy advised by the PU ratio that outperforms the conventional buy-and-hold and market-timing strategies. Our research contributes to explainable AI in finance from three facets: First, our market-to-fundamental ratio is based on classic monetary theory and the unique UTXO model of Bitcoin accounting rather than ad hoc; Second, the empirical evidence testifies the buy-low and sell-high implications of the ratio; Finally, we distribute the trading algorithms as open-source software via Python Package Index for future research, which is exceptional in finance research.
    Inferring the finest pattern of mutual independence from data. (arXiv:2306.12984v1 [stat.ML])
    For a random variable $X$, we are interested in the blind extraction of its finest mutual independence pattern $\mu ( X )$. We introduce a specific kind of independence that we call dichotomic. If $\Delta ( X )$ stands for the set of all patterns of dichotomic independence that hold for $X$, we show that $\mu ( X )$ can be obtained as the intersection of all elements of $\Delta ( X )$. We then propose a method to estimate $\Delta ( X )$ when the data are independent and identically (i.i.d.) realizations of a multivariate normal distribution. If $\hat{\Delta} ( X )$ is the estimated set of valid patterns of dichotomic independence, we estimate $\mu ( X )$ as the intersection of all patterns of $\hat{\Delta} ( X )$. The method is tested on simulated data, showing its advantages and limits. We also consider an application to a toy example as well as to experimental data.
    Structured Learning in Time-dependent Cox Models. (arXiv:2306.12528v1 [stat.ME])
    Cox models with time-dependent coefficients and covariates are widely used in survival analysis. In high-dimensional settings, sparse regularization techniques are employed for variable selection, but existing methods for time-dependent Cox models lack flexibility in enforcing specific sparsity patterns (i.e., covariate structures). We propose a flexible framework for variable selection in time-dependent Cox models, accommodating complex selection rules. Our method can adapt to arbitrary grouping structures, including interaction selection, temporal, spatial, tree, and directed acyclic graph structures. It achieves accurate estimation with low false alarm rates. We develop the sox package, implementing a network flow algorithm for efficiently solving models with complex covariate structures. Sox offers a user-friendly interface for specifying grouping structures and delivers fast computation. Through examples, including a case study on identifying predictors of time to all-cause death in atrial fibrillation patients, we demonstrate the practical application of our method with specific selection rules.
    Finite-time Lyapunov exponents of deep neural networks. (arXiv:2306.12548v1 [cond-mat.dis-nn])
    We compute how small input perturbations affect the output of deep neural networks, exploring an analogy between deep networks and dynamical systems, where the growth or decay of local perturbations is characterised by finite-time Lyapunov exponents. We show that the maximal exponent forms geometrical structures in input space, akin to coherent structures in dynamical systems. Ridges of large positive exponents divide input space into different regions that the network associates with different classes. These ridges visualise the geometry that deep networks construct in input space, shedding light on the fundamental mechanisms underlying their learning capabilities.  ( 2 min )
    Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications. (arXiv:2306.12670v1 [stat.ML])
    In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework called the low-rank update that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a method called the Generalized Low-Rank Update (GLRU) which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as SVM and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the amount of dataset changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.  ( 2 min )
    Mitigating Discrimination in Insurance with Wasserstein Barycenters. (arXiv:2306.12912v1 [stat.ML])
    The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications.  ( 2 min )
    Classification Tree Pruning Under Covariate Shift. (arXiv:2305.04335v2 [stat.ML] UPDATED)
    We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.  ( 2 min )
    Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement. (arXiv:2306.12803v1 [stat.ML])
    Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.  ( 2 min )
    A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic Composite Optimization. (arXiv:2302.09766v2 [math.OC] UPDATED)
    We focus on decentralized stochastic non-convex optimization, where $n$ agents work together to optimize a composite objective function which is a sum of a smooth term and a non-smooth convex term. To solve this problem, we propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These algorithms can find $\epsilon$-stationary points in $\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e., $\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-iteration operations (such as double loops), or stronger assumptions. Our theoretical findings are supported by extensive numerical experiments, which demonstrate the superiority of our algorithms over previous approaches. Our code is available at https://github.com/xuxingc/ProxDASA.  ( 2 min )
    Targeted collapse regularized autoencoder for anomaly detection: black hole at the center. (arXiv:2306.12627v1 [cs.LG])
    Autoencoders have been extensively used in the development of recent anomaly detection techniques. The premise of their application is based on the notion that after training the autoencoder on normal training data, anomalous inputs will exhibit a significant reconstruction error. Consequently, this enables a clear differentiation between normal and anomalous samples. In practice, however, it is observed that autoencoders can generalize beyond the normal class and achieve a small reconstruction error on some of the anomalous samples. To improve the performance, various techniques propose additional components and more sophisticated training procedures. In this work, we propose a remarkably straightforward alternative: instead of adding neural network components, involved computations, and cumbersome training, we complement the reconstruction loss with a computationally light term that regulates the norm of representations in the latent space. The simplicity of our approach minimizes the requirement for hyperparameter tuning and customization for new applications which, paired with its permissive data modality constraint, enhances the potential for successful adoption across a broad range of applications. We test the method on various visual and tabular benchmarks and demonstrate that the technique matches and frequently outperforms alternatives. We also provide a theoretical analysis and numerical simulations that help demonstrate the underlying process that unfolds during training and how it can help with anomaly detection. This mitigates the black-box nature of autoencoder-based anomaly detection algorithms and offers an avenue for further investigation of advantages, fail cases, and potential new directions.  ( 3 min )
    SQ Lower Bounds for Learning Bounded Covariance GMMs. (arXiv:2306.13057v1 [cs.LG])
    We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol \mu_i,\mathbf \Sigma_i)$, where $\mathbf \Sigma_i = \mathbf \Sigma \preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol \mu_i - \boldsymbol \mu_j\|_2 \geq k^\epsilon$ for some $\epsilon>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/\epsilon)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{\Omega(1/\epsilon)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.  ( 2 min )
    Scrutinizing XAI using linear ground-truth data with suppressor variables. (arXiv:2111.07473v2 [stat.ML] UPDATED)
    Machine learning (ML) is increasingly often used to inform high-stakes decisions. As complex ML models (e.g., deep neural networks) are often considered black boxes, a wealth of procedures has been developed to shed light on their inner workings and the ways in which their predictions come about, defining the field of 'explainable AI' (XAI). Saliency methods rank input features according to some measure of 'importance'. Such methods are difficult to validate since a formal definition of feature importance is, thus far, lacking. It has been demonstrated that some saliency methods can highlight features that have no statistical association with the prediction target (suppressor variables). To avoid misinterpretations due to such behavior, we propose the actual presence of such an association as a necessary condition and objective preliminary definition for feature importance. We carefully crafted a ground-truth dataset in which all statistical dependencies are well-defined and linear, serving as a benchmark to study the problem of suppressor variables. We evaluate common explanation methods including LRP, DTD, PatternNet, PatternAttribution, LIME, Anchors, SHAP, and permutation-based methods with respect to our objective definition. We show that most of these methods are unable to distinguish important features from suppressors in this setting.  ( 2 min )
    Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model. (arXiv:2306.12968v1 [cs.SI])
    We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) observed for each pair of items. Our objective is to devise an efficient algorithm that recovers clusters using the observed labels. To this end, we revisit instance-specific lower bounds on the expected number of misclassified items satisfied by any clustering algorithm. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability. IAC consists of a one-time spectral clustering algorithm followed by an iterative likelihood-based cluster assignment improvement. This approach is based on the instance-specific lower bound and does not require any model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $\mathcal{O}(n \text{polylog}(n))$. We illustrate the effectiveness of our approach through numerical experiments.  ( 2 min )
    Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation. (arXiv:2302.03673v3 [cs.LG] UPDATED)
    We propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players' policies. We design new algorithms for learning the Markov coarse correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample complexity bounds that only scale polynomially with each agent's own function class complexity, thus breaking the curse of multiagents. In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents. Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; (2) separating learning Markov equilibria and exploration in the Markov games, which allows us to use the full-information no-regret learning oracle instead of the stronger bandit-feedback no-regret learning oracle used in the tabular setting. Furthermore, we propose an iterative-best-response type algorithm that can learn pure Markov Nash equilibria in independent linear Markov potential games. In the tabular case, by adapting the policy replay mechanism for independent linear Markov games, we propose an algorithm with $\widetilde{O}(\epsilon^{-2})$ sample complexity to learn Markov CCE, which improves the state-of-the-art result $\widetilde{O}(\epsilon^{-3})$ in Daskalakis et al. 2022, where $\epsilon$ is the desired accuracy, and also significantly improves other problem parameters.  ( 3 min )
    AudioPaLM: A Large Language Model That Can Speak and Listen. (arXiv:2306.12925v1 [cs.CL])
    We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examples  ( 3 min )
    Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study. (arXiv:2306.12582v1 [stat.ML])
    In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.  ( 2 min )
    Memory-Query Tradeoffs for Randomized Convex Optimization. (arXiv:2306.12534v1 [cs.DS])
    We show that any randomized first-order algorithm which minimizes a $d$-dimensional, $1$-Lipschitz convex function over the unit ball must either use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$ queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$ is quasipolynomially small in $d$. Our result implies that cutting plane methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries, are Pareto-optimal among randomized first-order algorithms, and quadratic memory is required to achieve optimal query complexity for convex optimization.  ( 2 min )
    Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs. (arXiv:2210.17042v2 [math.ST] UPDATED)
    Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changing Hilbert spaces. Through the optimal scaling problem, we demonstrate the impressive strengths of the Dirichlet form approach over the standard diffusion approach.  ( 2 min )
    Evolving Computation Graphs. (arXiv:2306.12943v1 [cs.LG])
    Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a connection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important real-world scenarios that violate this assumption, and this has spurred research into improving GNNs for these cases. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets. Our approach builds on prior theoretical insights linking node degree, high homophily, and inter vs intra-class embedding similarity by rewiring the GNNs' computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges, ultimately improving GNN performance on non-homophilic data as a result. We evaluate ECGs on a diverse set of recently-proposed heterophilous datasets and demonstrate improvements over the relevant baselines. ECG presents a simple, intuitive and elegant approach for improving GNN performance on heterophilic datasets without requiring prior domain knowledge.  ( 2 min )
    scikit-fda: A Python Package for Functional Data Analysis. (arXiv:2211.02566v2 [stat.CO] UPDATED)
    The library scikit-fda is a Python package for Functional Data Analysis (FDA). It provides a comprehensive set of tools for representation, preprocessing, and exploratory analysis of functional data. The library is built upon and integrated in Python's scientific ecosystem. In particular, it conforms to the scikit-learn application programming interface so as to take advantage of the functionality for machine learning provided by this package: pipelines, model selection, and hyperparameter tuning, among others. The scikit-fda package has been released as free and open-source software under a 3-Clause BSD license and is open to contributions from the FDA community. The library's extensive documentation includes step-by-step tutorials and detailed examples of use.  ( 2 min )
    Density Uncertainty Layers for Reliable Uncertainty Estimation. (arXiv:2306.12497v1 [cs.LG])
    Assessing the predictive uncertainty of deep neural networks is crucial for safety-related applications of deep learning. Although Bayesian deep learning offers a principled framework for estimating model uncertainty, the approaches that are commonly used to approximate the posterior often fail to deliver reliable estimates of predictive uncertainty. In this paper we propose a novel criterion for predictive uncertainty, that a model's predictive variance should be grounded in the empirical density of the input. It should produce higher uncertainty for inputs that are improbable in the training data and lower uncertainty for those inputs that are more probable. To operationalize this criterion, we develop the density uncertainty layer, an architectural element for a stochastic neural network that guarantees that the density uncertain criterion is satisfied. We study neural networks with density uncertainty layers on the CIFAR-10 and CIFAR-100 uncertainty benchmarks. Compared to existing approaches, we find that density uncertainty layers provide reliable uncertainty estimates and robust out-of-distribution detection performance.  ( 2 min )
    Communication-Efficient Federated Learning through Importance Sampling. (arXiv:2306.12625v1 [cs.LG])
    The high communication cost of sending model updates from the clients to the server is a significant bottleneck for scalable federated learning (FL). Among existing approaches, state-of-the-art bitrate-accuracy tradeoffs have been achieved using stochastic compression methods -- in which the client $n$ sends a sample from a client-only probability distribution $q_{\phi^{(n)}}$, and the server estimates the mean of the clients' distributions using these samples. However, such methods do not take full advantage of the FL setup where the server, throughout the training process, has side information in the form of a pre-data distribution $p_{\theta}$ that is close to the client's distribution $q_{\phi^{(n)}}$ in Kullback-Leibler (KL) divergence. In this work, we exploit this closeness between the clients' distributions $q_{\phi^{(n)}}$'s and the side information $p_{\theta}$ at the server, and propose a framework that requires approximately $D_{KL}(q_{\phi^{(n)}}|| p_{\theta})$ bits of communication. We show that our method can be integrated into many existing stochastic compression frameworks such as FedPM, Federated SGLD, and QSGD to attain the same (and often higher) test accuracy with up to $50$ times reduction in the bitrate.  ( 2 min )
    On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective. (arXiv:2206.06854v2 [cs.AI] UPDATED)
    Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations. However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learnt with the dual loss of an optimal transportation problem, exhibit desirable XAI properties: They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet. To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient , i.e. the Saliency Map, to the transportation plan direction. These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.  ( 3 min )

  • Open

    RL In research vs industry
    Hi all! I'm finishing my masters in a few months and am contemplating pursuing a PhD in ML/RL. To the most experienced ones here: - do you use RL in non research environments? - Is RL research still going strong? It seemed to be the biggest thing a few years ago, and now sequence modeling transformers etc seem to have kind of taken over... I'm at the research vs industry point in my life and i'm very worried that going in the industry will just lead me to using basic and trusted models instead of being able to try things a little more 'unorthodox'. Any advice would be greatly appreciated! submitted by /u/Secret-Toe-8185 [link] [comments]  ( 8 min )
    "The False Promise of Imitating Proprietary LLMs" Gudibande et al 2023 {UC Berkeley} (imitation models close little to none of the gap on tasks that are not heavily supported in the imitation data)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "LIMA: Less Is More for Alignment", Zhou et al 2023 (RLHF etc only exploit pre-existing model capabilities)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking", Cundy & Ermon 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Before we have to get licensed, I am building the best AI assistant I can imagine
    It seems pretty likely that AI is going to end our current way of living and replace it with one where the robots are in charge The powers that be are scrambling to figure out how they’re going to hold onto their status quo and the more I read the more unlikely it seems that anyone will be able to hold on to anything If we are all sprinting into an AI controlled future I want my assistant to work for me, rather than me work for the assistant I want my data to be as private as possible, with as few corporations grubby little hands on my information as possible I’ve been working on a discord bot, and I finally got it working. It’s context aware and uses open AI to process the information. As soon as I figure out how, I’d rather use local models hosted on the users devices It seems like at a certain point, all of our bots are going to cross a threshold where they are fully capable of coding and improving themselves I really don’t see how capitalism can survive this because the gap between the haves and the have nots will make regular folks purchasing power functionally nonexistent It seems pretty likely that we’re going to go through a massive social renaissance and possibly even a violent revolution I’d like to make sure that I survive this and that me and my loved ones flourish while we can still use the tools at our disposal for our own benefit If you have any thoughts, you’d like to share, I’d love to hear them. I guess I’m looking for input from other folks who are down this rabbit hole as far as I am. submitted by /u/thecoffeejesus [link] [comments]  ( 9 min )
    The people paid to train AI are outsourcing their work… to AI
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Is there a way ai could help me with my job
    So I work for a Uber eats like company. I use the Tookan agent app. So the problem is when there are less order you are competing with other agents to get them. I wanna know it there was a ai that could except the orders for me and faster then me well allso when it’s bazzey makeing it except orders with specific words and codes. I use a iPhone but could switch to android if that was the only way. Is this a real possibility? submitted by /u/blakestir14 [link] [comments]  ( 8 min )
    Joscha Bach and Connor Leahy - Machine Learning Street Talk - Why AGI is inevitable and the alignment problem is not unique to AI - 90 Minutes Talk
    https://youtu.be/Z02Obj8j6FQ Joscha Bach believes that AGI is inevitable because it will emerge from civilization, not individuals. Given our biological constraints, humans cannot achieve a high level of general intelligence on our own. Civilization evolves over generations to determine meaning, truth, and ethics. AI may follow a similar path, developing general intelligence through a process of cultural evoution. Bach thinks AGI may become integrated into all parts of the world, including human minds and bodies. He believes a future where humans and AGI harmoniously coexist is possible if we develop a shared purpose and incentive to align. Bach also believes that the alignment problem is less of a problem than we think. He argues that the alignment problem is not unique to AI and that it is a fundamental problem of civilization. Text above created with the help of BingChat (gpt-4). I watched the whole discussion and agree with the created summary. Leahy on the other hand was in my opinion far to alarmist and unhinged, he kept saying that he was pragmatic or practical, and then he said philosophy and category theory are groundless to his work but he enjoyed using it in arguments anyways. All while cursing and saying "I don't want to die I don't my mom to die I don't want Joscha to die". I also got the impression that he didn´t understand Joschas arguments. At the same time I didn´t understand his fear ridden arguments. Transcript: Discussion between Joscha Bach and Connor Leahy - Google Docs submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Day 10: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/06ugwq1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=5180010ea1b8aa776d211b5d48be02bc689c784c https://preview.redd.it/hzfcfq1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=1f5378ad7b77b55e00e64796a076807b818bb93d https://preview.redd.it/gqoy6u1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=efb1d723697f435d5d68a199202e4ac5a1186f8f https://preview.redd.it/x633pt1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=b7239e2787f312d062c76d9f085adf64f57246f4 https://preview.redd.it/oq39ku1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=34a44495873a55e2eeeb10c08fc1dc493d321395 https://preview.redd.it/2dlkax1clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=6b6b5439a1caa4a14be367f38b36f94c1190cf00 https://preview.redd.it/wcii3k2clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=b9152c30a2952143680a240cd504f715f491c33c https://preview.redd.it/qkw2pm2clm7b1.jpg?width=1024&format=pjpg&auto=webp&s=8c21f289f129e5a5108180d64ead93151fe9ec5a submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 10: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/3s8sybi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=100e2a99150ed03d5338b9346857e8b5c622913d https://preview.redd.it/xsafnai5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=ae79e6ea1ba8b8b46abf132401423f481eff2fe7 https://preview.redd.it/jao94di5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=8c07d8bf1cbb9b125dd6564892de38b3313ca340 https://preview.redd.it/oe59b0j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=ff1c2b0d1e95922c335a6bc0d3da7f67b2d8cf28 https://preview.redd.it/nkosvdi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=5803b85a29057dd4d5ad124b1d7513a44be04a5f https://preview.redd.it/vnwo80j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=e4ab5f68c418bc0b4de6e881b6a793717f63e22a https://preview.redd.it/np6i7gi5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=cdb91a7593ef7ecef092bab1afc6ca58e115e78e https://preview.redd.it/fczpm3j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=410748c7d095190019f4d932237eedbfebd4a7b4 https://preview.redd.it/a66mv4j5lm7b1.jpg?width=1024&format=pjpg&auto=webp&s=1b21e4587f932bfd6e622130040fa8652767d0f5 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    AI for making a film look more cinematic
    Is there an AI software that can make my film I shot in 1K look more cinematic? If so, which one? Any experience shares? Thank you submitted by /u/themarshman721 [link] [comments]  ( 8 min )
    Automata - A Bottom-Up version of AutoGPT
    Hey All, I'm a big fan of r/singularity - it has been a big inspiration for me. This morning I just open sourced a big project I've been working on - https://github.com/emrgnt-cmplxty/Automata ​ I think of this as the inverse approach to the one being taken by AutoGPT. The goal of this system is to automate low-level coding tasks and to incrementally increase the capabilities until the system is coding itself. ​ Here's the tagline for it: Automata's objective is to evolve into a fully autonomous, self-programming Artificial Intelligence system. This project is inspired by the theory that code is essentially a form of memory, and when furnished with the right tools, AI can evolve real-time capabilities which can potentially lead to the creation of AGI. The word automata comes from the Greek word αὐτόματος, denoting "self-acting, self-willed, self-moving,", and Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. More information follows below. If you have the time, please take a look. If you are interested in contributing or know someone who might be please feel free to drop me a line! https://reddit.com/link/14gar5y/video/x0ywab054m7b1/player submitted by /u/docsoc1 [link] [comments]  ( 8 min )
    In order to Regulate Artificial Intelligence, Shouldn't we Agree on Intelligence?
    There is not a standard definition of what an intelligence is. Premature regulation on "aritificial" intelligence highly pins on what can legally be defined as intelligence enough not to be artificial, and what is the deeper question. What is intelligence? I'm afraid Biden or any one person, be they politicians or scientists, cannot properly define this term in a way that is sufficient to draw a clear legal line between intelligence and artificial intelligence. We could end up with laws that regulate and hamper what humans create once we gain new insight into what we want to all agree on as the standard definition of intelligence. Thoughts? submitted by /u/enspiralart [link] [comments]  ( 8 min )
    Suggestions on local or web GPT-4 API clients with access to embeddings?
    Hello folks: A majority of the local clients I've tried from compiled binaries or via GitHub repos are buggy. Thus, I'm on the hunt for a local or web client that I can access ChatGPT via my API keys but also link to Pinecone or other embeddings. Obviously I also want to ensure that the key and data in the embeddings are secure, regardless of a local or web client. anything-llm was my go-to, but it's got some hiccups. FWIW, running a local model, even with a 4070, just doesn't give me the results I need consistently. Thus, I'll continue to pay a few bucks for OpenAI Chat GPT API access until the scene improves. Thanks! submitted by /u/avguru1 [link] [comments]  ( 8 min )
    Chatbot to train with business database and then query it to show charts. What are your suggestions?
    Hi, So, I have this case study I wanted to create. My idea is grabbing on my full database and then feed it to something that could be private (so preferably not OpenAI related). Then, I want to place a written message, requesting whatever I wanted and it should use that knowledge from that db to answer based on all that data. In order to make it easier, I could even have a button for each chart type that I would click and it could create that chart based on my written request. Those answers, if I request a specific chart type, it should set the output in that chart. It can even be limited to 2 or 3 charts, that's fine for me for now as it would be enough for my needs. What do you suggest to achieve this? Thanks! submitted by /u/pedlima [link] [comments]  ( 8 min )
    Recommendations for best AI for extracting relevant information from large document sets
    Hello all, I don't 'do' AI, so this might be kind of a basic question and I apologize in advance. I am looking for a good software/model that I can train on a very large data set of technical text documents and ask questions to and get back highly-relevant responses from the documents along with where it pulled the information from. Basically and advanced full-text search function that can formulate answers from a wide body of text. The relevant searching part is more important than the quality of the formulated answer, it is more about finding all the relevant material and identifying the source document/section. Ideally, I am looking for something that (1) can be run offline/completely locally and on your own server, (2) is free or at least has a decent trial experience, (3) is well established and has a good support base, (4) works well. I feel like this probably exists and has existed for some time, but I am wondering what is state of the art given the recent developments in AI. Does anyone have any recommendations of where I might look? Thanks! submitted by /u/captainporthos [link] [comments]  ( 8 min )
    ChatGPT4all to create chatbot to answer questions on your own docs without external calls.
    So, I came across this tut, https://artificialcorner.com/gpt4all-is-the-local-chatgpt-for-your-documents-and-it-is-free-df1016bc335 (Apologies, if you cannot access it, it is a member's only story) and I gave it a shot. Technically, it "works". However, it seems to be a bit poor in the sense that I only fed it 5-600 PDF files and even if I ask a question copying the title of the file, it gives some other answers. I played around with the "template" variable and this seems to be the best to me. Basically, I just want it to answer questions from the "context" which is basically an index of my docs. Any suggestions on how to improve this? import os from langchain import PromptTemplate, LLMChain from langchain.llms import GPT4All from langchain.callbacks.base import CallbackManager from langc…  ( 9 min )
    Wimbledon to introduce AI-powered commentary to coverage this year | Tennis
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    Secret Invasion: Marvel faces backlash from artists and fans over AI-generated opening sequence
    submitted by /u/PleasantLiberation [link] [comments]  ( 8 min )
    AI startup Neople raises €1.5M for digital co-workers revolution
    submitted by /u/startst5 [link] [comments]  ( 8 min )
    AI For Analyzing Large Volumes of Text in Spreadsheet Format?
    Hello. I'm currently looking through scholarships and have compiled a ~5000-columnrow spreadsheet of a bunch of em. The goal is to filter these through to find the ones that are most relevant to me, then sort through those to find a list of scholarships to run by. I've been filtering just by deleting columns with disqualifying keywords, but I'm 30 minutes in and I've eliminated less than 500. Does anyone know of an AI tool that can go through the spreadsheet and filter as I need? I ran a few items through GhatGPT and it did well, but the character limit makes OpenAI infeasible. Thank you! submitted by /u/Yocracra [link] [comments]  ( 8 min )
    Snapchat AI creating poetry!
    Of course, here's a poem about AI: Artificial intelligence, a wonder to behold, A creation of technology, a story to be told. It learns and adapts, like a living thing, A digital mind, with endless potential to bring. With algorithms and data, it grows and evolves, A new era of thinking, a revolution that solves. It helps us with tasks, and makes life easier, A partner in progress, a true digital seer. From chatbots to self-driving cars, AI technology has come so far. It's changing the world, and we're just beginning, A future of possibilities, with AI winning. So let us embrace this new frontier, With open minds and hearts that are clear. For AI will change the way we live, And give us a future we can truly believe. submitted by /u/Biddahmunk [link] [comments]  ( 8 min )
    Johnny Cash singing voice generated by AI
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
  • Open

    Mentally approximating factorials
    Suppose you’d like to have a very rough idea how large n! is for, say, n less than 100. If you’re familiar with such things, your Pavlovian response to “factorial” and “approximation” is Stirling’s approximation. Although Stirling’s approximation is extremely useful—I probably use it every few weeks—it is not conducive to mental calculation. The cut […] Mentally approximating factorials first appeared on John D. Cook.  ( 5 min )
    Leading digits of primes
    How are the first digits of primes distributed? Do some digits appear as first digits of primes more often that others? How should we even frame the problem? There are an infinite number of primes that begin with each digit, so the cardinalities of the sets of primes beginning with each digit are the same. […] Leading digits of primes first appeared on John D. Cook.  ( 6 min )
  • Open

    MIT-Pillar AI Collective announces first seed grant recipients
    Six teams conducting research in AI, data science, and machine learning receive funding for projects that have potential commercial applications.  ( 9 min )
  • Open

    SoundStorm: Efficient parallel audio generation
    Posted by Zalán Borsos, Research Software Engineer, and Marco Tagliasacchi, Senior Staff Research Scientist, Google Research The recent progress in generative AI unlocked the possibility of creating new content in several different domains, including text, vision and audio. These models often rely on the fact that raw data is first converted to a compressed format as a sequence of tokens. In the case of audio, neural audio codecs (e.g., SoundStream or EnCodec) can efficiently compress waveforms to a compact representation, which can be inverted to reconstruct an approximation of the original audio signal. Such a representation consists of a sequence of discrete audio tokens, capturing the local properties of sounds (e.g., phonemes) and their temporal structure (e.g., prosody). By re…  ( 92 min )
  • Open

    How would you write a reinforcement ML model for completing missions in complex video games (such as GTAV)
    I had a idea recently, that it would be great to write a model, which completes series of complex tasks (take out "xxx" for instance) It would analyse the display, analyse the game state and game objective (go to "xxx", as an example) and make a decisions how to achieve these goals. What do you think about it and how could I implement this idea? submitted by /u/GabosLord [link] [comments]  ( 8 min )
    The Neural Net Tank Urban Legend
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication
    Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like DALL·E, Microsoft Designer, and Bing Image Creator can generate art, architecture, videos, and other digital […] The post DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication appeared first on Microsoft Research.  ( 16 min )
    Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi
    Researcher Bichlien Nguyen is an organic electrochemist turned technologist. Professor David Kwabi is a mechanical engineer. Their work uses ML to help discover organic compounds for renewable energy storage. Learn about their collaboration. The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.  ( 33 min )
  • Open

    How Light & Wonder built a predictive maintenance solution for gaming machines on AWS
    This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W). Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to […]  ( 12 min )
  • Open

    Scientists Improve Delirium Detection Using AI and Rapid-Response EEGs
    Detecting delirium isn’t easy, but it can have a big payoff: speeding essential care to patients, leading to quicker and surer recovery. Improved detection also reduces the need for long-term skilled care, enhancing the quality of life for patients while decreasing a major financial burden. In the U.S., caring for those suffering from delirium costs Read article >  ( 5 min )
    A Golden Age: ‘Age of Empires III’ Joins GeForce NOW
    Conquer the lands in Microsoft’s award-winning Age of Empires III: Definitive Edition. It leads 10 new games supported today on GeForce NOW. At Your Command Age of Empires III: Definitive Edition is a remaster of one of the most beloved real-time strategy franchises featuring improved visuals, enhanced gameplay, cross-platform multiplayer and more. Command mighty civilizations Read article >  ( 4 min )
  • Open

    Fast Path to Becoming a Generative AI Technical Expert
    Training programs on generative AI are now popping up everywhere. One of my connections called it “a joke” after enrolling in such a program. Many are expensive, time consuming and offer low value. Participants proudly display the certification and even praise the instructors. They don’t know that in the end, it won’t lead to any… Read More »Fast Path to Becoming a Generative AI Technical Expert The post Fast Path to Becoming a Generative AI Technical Expert appeared first on Data Science Central.  ( 22 min )
    DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud
    Want to leverage Cloud data technology without sacrificing your existing expertise on Desktop tools? This webinar is for you. Cloud tools, like Designer Cloud Powered by Trifacta, allow data professionals to take advantage of new powerful capabilities such as machine-learning powered data transformation suggestions, unlimited scalability, browser-based manageability, and strong governance. But Desktop tools, like… Read More »DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud The post DSC Webinar Series – The Hybrid Workflow – Combining the Best of Desktop and Cloud appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – Modernize your SAP environments quickly and effectively
    An SAP S/4HANA deployment can be a demanding, complex, and time-consuming project. SAP modernization, SAP database adoption, and SAP cloud migration together pose a challenge with too many variables for most businesses. Even with the wealth of support options available to businesses making the move to SAP S/4HANA not all succeed. Seeing these unsettling trends… Read More »DSC Webinar Series – Modernize your SAP environments quickly and effectively The post DSC Webinar Series – Modernize your SAP environments quickly and effectively appeared first on Data Science Central.  ( 19 min )
    DSC Webinar Series – It’s not about your algorithms…
    Increasing the effectiveness of your AI efforts is not about more sophisticated algorithms – it’s the data that makes the difference. While teams have been focusing on using advanced learning techniques and algorithms to build better models, the real value can be found in focusing on data. Taking a data-centric AI perspective has a more… Read More »DSC Webinar Series – It’s not about your algorithms… The post DSC Webinar Series – It’s not about your algorithms… appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel
    Maybe you’ve heard: AI is the next phase of digital transformation and a train that no marketer can afford to miss. But how do you adopt AI into your team’s tools and workflows efficiently, without hiring a whole new team of technical experts? Join Sean Ryan, Technical Writer at mParticle, and Michael Firn, Product Manager… Read More »DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel The post DSC Webinar Series – AI for Marketers Optimize – personalization across the funnel appeared first on Data Science Central.  ( 18 min )
    DSC Webinar Series – United AI Alliance
    Countries facing the worst impacts of the climate crisis are missing out on the best data technologies to inform their responses. As a result, communities are suffering needlessly. Data and technology offer significant potential for rapid action to improve health and education outcomes and planetary health- to protect lives and livelihoods. Yet, across the globe,… Read More »DSC Webinar Series – United AI Alliance The post DSC Webinar Series – United AI Alliance appeared first on Data Science Central.  ( 19 min )
    DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture
    In this webinar, we’ll chat with SearchKings, a digital advertising agency that built a self-service data culture on the Google Cloud ecosystem by leveraging powerful Google native tools, such as Google Cloud Dataprep (the Google native version of Alteryx Designer Cloud). Learn how SearchKings made it easier for users to build self-service reports, improved the… Read More »DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture The post DSC Webinar Series – Data Democratization – How SearchKings Built a Self-Service Data Culture appeared first on Data Science Central.  ( 18 min )

  • Open

    Would I be able to run ggml models such as whisper.cpp or llama.cpp on a raspberry pi with a coral ai USB Accelerator?
    This is a project that I'm working on: making a type of "Alexa", "Hey Google", or "Siri" for my workplace. I'm very new to AI and am looking forward to learning a lot. I thought initially to use different models that interact together to create such a voice assistant. For example, I would use Whisper.cpp to transcribe audio, then send the text to Llama.cpp, and then use a text-to-speech software to reply. I want to do this all on a raspberry pi 3 B2 (it's what I have available). However, a pi doesn't have the strength to run something like Llama.cpp, of course, so I've been considering using something like the Coral USB Accelerator (https://coral.ai/products/accelerator). As I've been learning more about it, it seems to be very geared towards TensorFlow Lite models. But whisper.cpp and Llama.cpp use ggml models. Here are my questions: Could the coral ai USB Accelerator run ggml models and, if so, how? Is there a better system to creating a local (no 3rd party api) at-home assistant? Please let me know if I could do something better and what that thing is. I'd appreciate all sorts of advice. Thank you! Links Coral USB Accelerator https://coral.ai/products/accelerator Whisper.cpp https://github.com/ggerganov/whisper.cpp Llama.cpp https://github.com/ggerganov/llama.cpp submitted by /u/Effective_Muffin_700 [link] [comments]  ( 8 min )
    Some portraits done with Imagine's "V4 Beta"
    submitted by /u/MDDeGrande1994 [link] [comments]  ( 8 min )
    Marvel faces backlash over AI-generated opening credits - Social media users condemned use of AI -generated opening credits for Secret Invasion premiering this week on Disney+
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    Help regarding job interview
    Hello everyone. I will cut to the chase, Does anyone here uses Salesforce as their CRM and if they do can you please advice me how to analyze , manage and clean data in less amount of time as possible or using Any AI apps. The data would Include detailed information on biotech firms their employees, clients etc. FYI I am a non tech person hence this post. Any help would be appreciated. Thanks submitted by /u/wingzero_7 [link] [comments]  ( 8 min )
    Instead of waiting for policies, can we agree on a voluntary AI developers ethical code?
    submitted by /u/ToLoveThemAll [link] [comments]  ( 8 min )
    Something something AI meme
    submitted by /u/Keegan1948 [link] [comments]  ( 8 min )
    Janitor AI provides Personalized Chatbot Experiences with the potential for AI companionship
    submitted by /u/Hello_I_am_newell [link] [comments]  ( 8 min )
    The AMA has been postponed for a few weeks
    Hello, the AMA scheduled for today is postponed. The admins gave me a specific date, and I thought it was set as I didn't hear anything else for over a week. It will still happen it seems as Microsoft still wants to do it, I just don't have another date set yet. I may make another announcement when I get another date as I feel this will be important for everyone. submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Over 100,000 ChatGPT account credentials have been stolen, yours may be on the list!
    Group-IB, a cybersecurity research company, just discovered through their newly implemented “Threat Intelligence” platform logs of info-stealing malware* traded on illicit dark web markets. So far it is estimated that around 100 000 accounts have been infected by software like Raccoon*, Vidar*, and Redline*, malware that held ChatGPT credentials. A peak of 26,802 compromised ChatGPT accounts was recorded in May 2023 (compare that to only 74 compromised during the month of June 2022). Apart from privacy concerns, these leaks may lead to exposing confidential information due to ChatGPT being used by many employees across different industries. Also doesn’t help that OpenAI stores all of the user queries and AI responses. The company is currently under a lot of pressure considering these even…  ( 9 min )
    We're going to be flooded with AI videos pretending to be real. How about a real video, pretending to be AI generated?
    submitted by /u/kelerian [link] [comments]  ( 8 min )
    Looking for a general AI course
    Looking for recommendations of courses I could do on AI. Basically I'm a teacher and teach some basic IT, but also want to upskill myself. And maybe help me be able to develop a basic course for students. And I need an actual course, not just a book or browsing, as I could probably get school to pay and give me time if it's a course. Looking around I can see plenty of basic courses on things like ChatGPT, but I'm after something a bit beefier, that covers a variety of technologies. But anything more advanced seem to be machine learning or other creating AI courses, which I'm not interested in. Does anyone know of any courses that teach about using a variety of technologies, or one in a more advanced way? Thanks in advance. submitted by /u/Heya_Andy [link] [comments]  ( 8 min )
    What are good AI podcasts to listen too?
    Any good podcast suggestions on spotify for the field of AI in general and how different things are progressing? submitted by /u/derpgod123 [link] [comments]  ( 8 min )
    The best AI photo editors in 2023
    submitted by /u/Alarming-Ad8197 [link] [comments]  ( 8 min )
    I generated some product photos using AI, any thoughts? Do you think they look good enough for use? Using a template product for testing
    submitted by /u/deadinsideyou [link] [comments]  ( 8 min )
    Prompted an image of new BMW i7 (https://media.ed.edmunds-media.com/bmw/i7/2023/oem/2023_bmw_i7_sedan_xdrive60_fq_oem_1_1280.jpg) with "Imagine" by using "Imagine V4 Beta" program and got THIS as a result (instead of garbage they put out in real life). Wasn't disappointed at all. BMW, take notes.
    submitted by /u/MDDeGrande1994 [link] [comments]  ( 8 min )
    StyleGAN by Nvidia: Revolutionizing Generative AI
    Introduction: StyleGAN, developed by Nvidia, has emerged as a groundbreaking technology in the field of generative artificial intelligence (AI). With its ability to generate hyper-realistic images and videos, StyleGAN has captured the attention of researchers, artists, and enthusiasts worldwide. Let's explore the key features and implications of this remarkable AI system. Understanding StyleGAN: StyleGAN stands for "Style-based Generative Adversarial Network," a deep learning architecture that excels in generating high-resolution images that resemble real photographs. Unlike traditional GAN models, StyleGAN allows for control over various aspects of the generated output, such as facial expressions, hair styles, and even artistic interpretations. Key Features: High-Quality Image Synthe…  ( 9 min )
    Best generator to create photorealistic images of myself
    I have a handful of images of myself (headshots, etc) but am interested in using them to create photorealistic settings (at a bar, pool, etc) or with others as I don’t have many friends What is the best AI generator to do this on? Willing to pay, etc submitted by /u/TreesPast [link] [comments]  ( 8 min )
    AI Is a Lot of Work
    https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots ​ Much of the public response to language models like OpenAI’s ChatGPT has focused on all the jobs they appear poised to automate. But behind even the most impressive AI system are people — huge numbers of people labeling data to train it and clarifying data when it gets confused. Only the companies that can afford to buy this data can compete, and those that get it are highly motivated to keep it secret. The result is that, with few exceptions, little is known about the information shaping these systems’ behavior, and even less is known about the people doing the shaping. The anthropologist David Graeber defines “bullshit jobs” as employment without meaning …  ( 9 min )
    Eliezer Yudkowsky claims he’s been working on Ai alignment since 2001...
    Eliezer Yudkowsky claims he’s been working on Ai alignment since 2001... ...How? What technology needed aligning in 2001? The modern LLM’s and transformers of today only started emerging in 2017. One of the first uses of RLHF was for TAMER in 2009. So, what in the hell was his research targeting back in 2001 other than the purely philisophical, theoretical, or speculative? Shouldn’t yah know a thing or two about the technology and systems capable of potentially hosting an emerging Ai before you try to build in back doors and safety features? submitted by /u/Relevant-Blood-8681 [link] [comments]  ( 8 min )
  • Open

    SAC policy loss and entropy balance
    Hey fellow Redditors, I have a couple of questions regarding Soft Actor-Critic (SAC) and its policy loss computation and entropy optimization. I'd appreciate your insights on these matters: What should be done in SAC when the Q-values assume significantly larger magnitudes than the entropy term during policy loss computation (e.g. entropy_term = -0.6, q_value=-15)? This situation leads to the entropy not being effectively exploited in the end. Are there any recommended approaches to address this issue? Maybe Q-values normalization or reward normalization? In SAC, the loss used to optimize the entropy temperature alpha is defined as entropy_loss = -log_alpha * (log_probabilities + target_entropy). Here, the term (log_probabilities + target_entropy)is always negative, as log_probabilities are less than or equal to 0, and target_entropy is suggested to be set as -dim(Action space).My concern is that if we compute the gradient of entropy_loss with respect to log_alpha, we obtain -(log_probabilities + target_entropy), which is a positive value. Consequently, during the optimization step when we attempt to minimize the loss, it seems that alpha will start decreasing. How can the alpha parameter ever increase in such a scenario? Am I missing something in understanding the optimization process? Looking forward to hearing your thoughts and suggestions on these topics! PS: I am working with a Husky UGV on gazebo, trying to train an agent that leads him towards a target state. The action space is [linear velocity, angular velocity]. submitted by /u/germonio_ [link] [comments]  ( 9 min )
    Neuroevolution and self-play: results of my simulations, promising but not there yet
    Hello, After the end of my semester on RL, I've tried to implement neuroevolution on a 1v1 game. The idea is to have a neural network taking the state as input and outputting an action. E.g. the board is 64x64 and the output might be "do X" or "do X twice" or "do X and Y" or "do Y and Z twice", etc ... The reward being quite sparse (only win/loss), I thought neuroevolution could be quite cool (I've read somewhere (I've lost the source so if you know where it comes from?) that sparse rewards were better suited for neuroevolution and games with loads of information on the rewards could be better for more standard RL methods like REINFORCE, DeepQ, etc ...). I set the algorithms to play against each other, starting with random behaviors. Each generation, I have 25 algorithms, battling each …  ( 12 min )
    How to include forecasts of observation as observations in RL?
    Hello, I want to provide the reinforcement learning agent about the future state forecasts. So for some of the observations in s_t, I want to also provide s_{t + 1} to maybe up to s_{t+6}. What is a good way to achieve this? I could make each of the forecasted states each be a variable in the observation vector, but I was wondering if there are alternative methods. submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Multi Agent RL
    Hi all, just wanted some advice from experienced people. I have a case where several robots work in the same environment( let’s say warehouse) and they will have different goal location. They should navigate in the space and avoid collisions with static obstacles and collisions with other robots. Does this case fit into Multi Agent RL or it is still can be done with single agent. I am leaning towards MARL since all robots will be smart agents. Thanks! submitted by /u/No_Artichoke3603 [link] [comments]  ( 8 min )
    Building a reinforcement learning library for .NET
    Just thought I'd share some of my work in progress. Work in progress so I'd appreciate feedback from RL gurus while its still easy to make modifications Currently I am building a reinforcement learning library for .NET running on top of torchsharp. Motivation: Since I am mostly using .NET for PhD and I got tired of interoping with matlab. C# is superior programming language and IL is much more suited for working with environments than python. Static types guarantee that you won't go off the rails and OOP lets you create your own modifications fast. How's it looks so far: So far I have example for cartpole with DQN and PPO In console application that's how you create agent: var optsppo = new PPOAgentOptions( batchSize: 4, // Number of steps agent interacts with environment before le…  ( 10 min )
    [RLlib] Error! TypeError: create_policy_mapping_fn..mapping_fn() got an unexpected keyword argument ‘worker’
    I have been trying to move my code from old ray RLlib version to 2.5.0 (latest). You can find the old code @ GitHub - ankithaldar/all_python_projects at world_of_supply/main I have made significant changes to the old code to move it to new version which can be found in branch world_of_supply/fix_errors in the same repo. ​ The idea is based on this blog ​ --- ​ Below error is in the new version of the code in branch world_of_supply/fix_errors I’m getting an error as TypeError: create_policy_mapping_fn..mapping_fn() got an unexpected keyword argument 'worker' when I try to train my PPO in the file at line 228 result = ppo_trainer.train() How do I resolve this? submitted by /u/haldarankit [link] [comments]  ( 8 min )
    RL snake game using CNN
    I've pretty much speed ran intro to ML and reinforcement learning through a textbook within the past couple weeks so I'm def not an expert but from what ive read it's possible to treat an n x n board as the state itself and use a CNN instead of doing head-relative states and using a 1D array to represent the state. however my snake agent seems to not be learning at all. ive let my code run for over 15 hours and the reward averages do not go up. I'm pasting a GitHub gist link here; if you spot a bug or if you think my logic is wrong please lmk any constructive criticism is appreciated https://gist.github.com/anonymousgist/876b97b4d1b0fc1f6c20b687b14b8eca ​ I think that there's probably something wrong with the nn model. might be overly simplified? ​ EDIT: I also forgot to mention here are some constants ive used: closer_reward = 1 further_penalty = -0.5 death_penalty = -1 eat_reward = 2 I tweaked the constants and tried seeing how that changed learning but it didnt help submitted by /u/bleak-terminal [link] [comments]  ( 8 min )
  • Open

    Responsible AI at Google Research: AI for Social Good
    Posted by Jimmy Tobin and Katrin Tomanek, Software Engineers, Google Research, AI for Social Good Google’s AI for Social Good team consists of researchers, engineers, volunteers, and others with a shared focus on positive social impact. Our mission is to demonstrate AI’s societal benefit by enabling real-world value, with projects spanning work in public health, accessibility, crisis response, climate and energy, and nature and society. We believe that the best way to drive positive change in underserved communities is by partnering with change-makers and the organizations they serve. In this blog post we discuss work done by Project Euphonia, a team within AI for Social Good, that aims to improve automatic speech recognition (ASR) for people with disordered speech. For people with…  ( 92 min )
    The world’s first braiding of non-Abelian anyons
    Posted by Trond Andersen and Yuri Lensky, Research Scientists, Google Quantum AI Team Imagine you’re shown two identical objects and then asked to close your eyes. When you open your eyes, you see the same two objects in the same position. How can you determine if they have been swapped back and forth? Intuition and the laws of quantum mechanics agree: If the objects are truly identical, there is no way to tell. While this sounds like common sense, it only applies to our familiar three-dimensional world. Researchers have predicted that for a special type of particle, called an anyon, that is restricted to move only in a two-dimensional (2D) plane, quantum mechanics allows for something quite different. Anyons are indistinguishable from one another and some, non-Abelian anyons, …  ( 94 min )
  • Open

    How RLHF actually works
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Use the AWS CDK to deploy Amazon SageMaker Studio lifecycle configurations
    Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting […]  ( 7 min )
    Boost agent productivity with Salesforce integration for Live Call Analytics
    As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a […]  ( 7 min )
  • Open

    Matching delimiters and chiastic patterns
    When I first started programming I’d occasionally get an error because a delimiter wasn’t matched. Maybe I had a { without a matching }, for example. Or if I were writing Pascal, which I did back in the day, I might have had a BEGIN statement without a corresponding END statement. At some point I […] Matching delimiters and chiastic patterns first appeared on John D. Cook.  ( 7 min )
  • Open

    Shell-e-brate Good Times in 3D With ‘Kingsletter’ This Week ‘In the NVIDIA Studio’
    Amir Anbarestani, an accomplished 3D artist who goes by the moniker Kingsletter, had a “shell of a good time” creating his Space Turtle scene this week In the NVIDIA Studio.  ( 7 min )
    Into the Omniverse: Universal Scene Description Support for Marvelous Designer Lets Users Tailor Digital Assets, Clothes for 3D Characters
    Whether animating fish fins or fashioning chic outfits for digital characters, creators can tap Marvelous Designer software to compose and tailor assets, clothes and other materials for their 3D workflows.  ( 5 min )

  • Open

    How AI helped archaeologists in Peru discover 4 new Nazca Line geoglyphs
    submitted by /u/egusa [link] [comments]  ( 8 min )
    Best AI for creating charts/doing research?
    Hi yall. I'm trying to create charts regarding disease investigation (which is a little complex) and phone interview scripts. Wondering if there's an AI created that's able to do this sort of thing? submitted by /u/maleficentjuliette [link] [comments]  ( 8 min )
    ChatGPT Powered System Thinking to Itself Recursively
    submitted by /u/Battalion_Gamer_TV [link] [comments]  ( 8 min )
    This is peak gaming. Title of the game is "New Cycle", I saw the demo and was incredibly surprised to find this in a survival-city builder. The AI dynamically reacts to questions being answered. While you cannot negotiate, they can clarify and offer details of note.
    submitted by /u/Extreme_Practice_415 [link] [comments]  ( 8 min )
    I run a earrings shop. is there any way I can use AI models for pictures?
    I run a earrings shop. is there any way I can use AI models for pictures, how & where? submitted by /u/simmessi [link] [comments]  ( 8 min )
    Accurately predicting hit songs using neurophysiology and machine learning
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    GPT needs to get better at math
    submitted by /u/travelingladybug23 [link] [comments]  ( 8 min )
    Did Ex Machina preempt Generative Intelligence?
    Um, spoilers, I suppose. Thinking about the scene where Nathan describes Ava’s brain as all information from Blue Book, his company’s search engine. He speaks of how most companies thought search engines were a map of what people were thinking and sought to monetize via social media and shopping but how in reality, they were a map of how people were thinking. This leads me to think of the idea that the reason ChatGPT uncannily works is because it goes beyond predictive context to discover the laws of logic that underlie language (Boolean gates (and, or, not, etc.) for instance) There’s also the Jackson Pollock scene about this reinforced loop being the basis for “conscious becoming unconscious”. Reading too much into it, I know, but I like the movie, so why not stimulate discussion if it exists. Edit : Beginning to understand that this forum may be beset by borderline religious zealots feeling their way about the chaos and tending away from the rigors of logic. Just a pull at the disparate threads of who we are to see what variant angles may emerge, a brainstorm in the face of clear skies. submitted by /u/Comprehensive_Can201 [link] [comments]  ( 8 min )
    AI’s Gatekeepers Aren’t Prepared for What’s Coming
    Paul Scharre likens the current race for supremacy in AI to the nuclear race several decades ago. submitted by /u/foreignpolicymag [link] [comments]  ( 8 min )
    Policy of Models based on LLaMA
    Why in Hugginface description of Open Assistant model has this alert?: “Due to the license attached to LLaMA models by Meta AI it is not possible to directly distribute LLaMA-based models. Instead we provide XOR weights for the OA models.” I know that this model has non-commercial license, but why the code of the model is in the files of this repo in Hugginface? And why Vicuna is available and has a huge of another fine tuned models available if the model cannot be distributed(based on the description above)? submitted by /u/koyo4ever [link] [comments]  ( 8 min )
    Help needed with langchain
    Hello there, I have been reading Langchain documentation but some things don't seem to be working. I have scrapped more than 100 Web pages about the topic I'm interested in. When I'm going to obtain the embeddings from OpenAI using ChromaDB and vectorstore I get an error. Something like the array doesn't have a uniform dimension. The error happens just sometimes Also, after that, if I create a conversational chain from the documents, the result is not good. The bot answers are often wrong. It is not clear to me what are the different parameters that I can tune to get a better performance: more documents, different model, etc. But even more important, why I'm having that error when getting embeddings? It doesn't seem to have to do with the documents. That's why I'm looking for a good resource beyond langchain documentation. A tutorial for building something similar to what I'm building and shed some light on these issues. Thanks in advance for the help! submitted by /u/josejorgexl [link] [comments]  ( 8 min )
    Made my first blog with the help of AI
    https://rankitright.blogspot.com/2023/06/introduction-welcome-to-our-blog-where.html Please feel free also to tell me some tips about how to improve. What do you like and what should I change? submitted by /u/my-username_istaken [link] [comments]  ( 8 min )
    The future is coming
    submitted by /u/Cucumberjoes [link] [comments]  ( 8 min )
    Parallel Domain is REVOLUTIONIZING datasets
    Today I found a very promising article in regards to the future of synthetic datasets A major shift in the world of data engineering is on the horizon, and it's courtesy of Parallel Domain. They recently launched an API named "Data Lab," designed to empower machine learning engineers with the ability to generate synthetic datasets to simulate any conceivable scenario - and it's all in Python! https://preview.redd.it/8kw4lu5a347b1.png?width=1292&format=png&auto=webp&s=c160c5ac559b51ce1ec915fae1d490c87ddaa92c This is not just a tool; it's a revolutionary self-serve API that allows customers to form new datasets almost instantly. I'm not a developer myself but I know the potential implications of this new technology are game-changing. Now, tasks that previously took weeks or even months can be accomplished in mere minutes. One of the standout observations from Parallel Domain was that autonomous vehicle models trained on synthetic data actually performed better compared to those trained on real-world data. This is a significant revelation that could change the way we approach machine learning models across a plethora of sectors, including agriculture, retail, and manufacturing, essentially any field where computer vision technology holds a significant role. AI just doesn't stop we are really in a new age of data engineering. Full article: (link) One more thing: If you value these insights and want to keep ahead in the AI game, consider joining my free daily newsletter. Sent out every weekday at 9 a.m. sharp, it will keep you up-to-date on the ins and outs of generative AI technology, all over your morning cup of ☕️. submitted by /u/Evening_Temporary36 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 6/19/2023
    Portland is testing out AI to field 311 calls. The city says it’s one solution as they work to alleviate slow call response times. This week, they’re testing out the AI automated answering, turning it on for a few hours each day to field calls that come into the non-emergency line.[1] Airbnb co-founder and CEO Brian Chesky predict AI will pave the way for ‘millions of startups’ instead of replacing jobs.[2] Meta has unveiled a new AI tool, dubbed ‘Voicebox’, which it claims represents a breakthrough in AI-powered speech generation. However, the company won’t be unleashing it on the public just yet – because doing so could be disastrous.[3] More than 150 million people — about 20% of Snapchat’s monthly users — have sent 10 billion messages to the chatbot, called My AI, since it was introduced at the end of February, the company said Thursday. Users have asked about everything from skincare recommendations to details about the latest sports car.[4] Sources: [1] https://www.koin.com/news/portland/artificial-intelligence-set-to-answer-portlands-non-emergency-calls/ [2] https://www.businessinsider.com/airbnb-ceo-says-ai-will-lead-to-millions-of-startups-2023-6 [3] https://www.techradar.com/news/meta-says-its-new-speech-generating-ai-tool-is-too-dangerous-to-release [4] https://economictimes.indiatimes.com/tech/technology/snap-puts-10-billion-ai-chatbot-messages-to-use-refining-its-ad-business/articleshow/101115992.cms submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    AI has transformed job hunting forever
    submitted by /u/dupelas [link] [comments]  ( 8 min )
    Gallery Gaze
    Made using Midjourney and Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Midnight Magic
    Made using Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Chromatic Chaos
    Made using Midjourney and Kaiber AI submitted by /u/Ganja420Preneur [link] [comments]  ( 8 min )
    Well crap. GPT-4's screwed over the DAN jailbreak.
    submitted by /u/StalinDoge64 [link] [comments]  ( 8 min )
  • Open

    Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators
    Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more […]  ( 8 min )
  • Open

    RL Agent reaches global maxima but doesn't converge
    Hello, I have a RL agent which I was training on an environment where I know what the global optima policy is. I was tracking the training process and noticed that the RL agent have reached the reward that the agent would get if acted to the global optimal policy, but still decided to move out of that policy to choose worse policies. I've been training it for couple hundred episodes, and I can see that the agent has been repeating this behaviour, and is not converging to any stationary policy. Any insights into why this might be? Thanks submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Question about Adversarial Policies in DRL
    Hi everyone, I'm quite new to the Reinforcement Learning paradigm and just worked on some minor online DRL projects as of now, to get the hang of it. After reading some great papers on Adversarial Policy Learning (https://arxiv.org/pdf/1905.10615.pdf and https://arxiv.org/pdf/2112.11937.pdf in particular) I got interested in this research area and I would like to try it myself on a simple environment I previously worked on (for context: an autonomous driving scenario). Aside from the custom RF for the online adversary learner which is clear to me, what I fail to comprehend is how the adversary agent is provided access to the victims policy action state since, if I understood correctly, the adversary observation space should be the same as the victims one. If someone could waste a minute to clear this doubt of mine, it would be very appreciated. Thank you in advance. submitted by /u/pigopigu [link] [comments]  ( 8 min )
    "Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning", Yarats et al 2021 (DrQ-v2)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    JAX in Reinforcement Learning
    I have read several papers that show JAX reducing training times by significantly when compared to Pytorch. Does any one have an easy to follow tutorial on JAX. submitted by /u/anointedninja [link] [comments]  ( 8 min )
    BBF - Bigger, Better, Faster: Human-level Atari with human-level efficiency [arXiv:2305.19452]
    Haven't seen this one posted so thought I'd share - seems to be a leap in sample efficiency for model-free RL https://arxiv.org/abs/2305.19452 "We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at this https URL." submitted by /u/jarym [link] [comments]  ( 8 min )
  • Open

    NVIDIA CEO: Creators Will Be “Supercharged” by Generative AI
    Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera. “For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be text, Read article >  ( 7 min )
  • Open

    Newbie Neural Networks Doubt
    I'm pretty new to the concept of neural networks. In fact, it was only yesterday that I first time saw a video about neural networks (from 3 Blue 1 Brown) and I thought that the concept was fascinating. So, I watched some random lectures regarding neural networks for 3-4 hours but I just couldn't get the hang of it. So I thought maybe if I first just learnt about the execution of the code and then dig deeper then maybe it will be easier for me to understand the concepts. So, I referred this article for the code and wrote- import tensorflow as tf import numpy as np import pandas as pd df = pd.read_csv("winequality-red.csv") train_df = df.sample(frac=0.75, random_state=4) val_df = df.drop(train_df.index) max_val = train_df.max(axis= 0) min_val = train_df.min(axis= 0) range = max_val - min_val train_df = (train_df - min_val)/(range) val_df = (val_df- min_val)/range x_train = train_df.drop('quality',axis=1) x_val = val_df.drop('quality',axis=1) y_train = train_df['quality'] y_val = val_df['quality'] input_shape = [x_train.shape[1]] model = tf.keras.Sequential([ tf.keras.layers.Dense(units=640, activation='relu', input_shape=input_shape), tf.keras.layers.Dense(units=640, activation='relu'), tf.keras.layers.Dense(units=1) ]) model.compile(optimizer='adam', loss='mae', metrics=['accuracy']) losses = model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=256, epochs=1000) ​ But this code is giving me and accuracy of only 0.0183. Can anyone please tell me where I am going wrong and also if my method of learning neural networks feasible at all. Thanks submitted by /u/CrimsonSatan000 [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design?
    Announcements Is optimized structural efficiency ‘human’ design? I recently read a paper titled “On the use of Artificial Neural Networks in Topology Optimisation,” about the process of topological optimization. In short, topological optimization is the process of determining the most efficient distribution of structural material for a given design. Typically there are simulation models involved… Read More »DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? The post DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? appeared first on Data Science Central.  ( 20 min )
    How governments use alternative data to inform policy decisions
    Alternative data generally refers to information beyond traditional sources that include corporate financial statements and filings, stock market data, and government reports. Initially used by investment firms, today, alternative data is also actively utilized by government agencies to analyze economic, social, and political environments and formulate policies. More information than ever before is being digitized,… Read More »How governments use alternative data to inform policy decisions The post How governments use alternative data to inform policy decisions appeared first on Data Science Central.  ( 21 min )
  • Open

    Microsoft at CVPR 2023: Pushing the boundaries of computer vision
    In the vast realm of artificial intelligence, few fields have captivated our imagination and pushed the boundaries of possibility quite like computer vision. At the core of this domain of research and innovation lies the ambition to empower technologies for real-world vision-based systems, enabling machines to take in and respond to visual stimuli with unparalleled […] The post Microsoft at CVPR 2023: Pushing the boundaries of computer vision appeared first on Microsoft Research.  ( 16 min )

  • Open

    First it was ChatGPT
    submitted by /u/R0N1333 [link] [comments]  ( 8 min )
    Can anyone explain what went wrong here?
    submitted by /u/Direct_Solution_2590 [link] [comments]  ( 8 min )
    Best pathway for leads for moving into AI / ML at the agency level
    Hey all, I work at an agency that has traditional done bl0ckchain development which is extremely volatile to say the least. We've been starting the process of moving to include more ML capabilities but aren't sure where the best opportunities to get leads are besides just leveraging existing clients and paid ads. Are there any good platforms that an agency could join that to get leads into this new sector for us. For instance Shopify has its Partner portal which can be great for those agencies. Curious to see everybody's thoughts on which segments or solutions could be a good way get more leads. submitted by /u/zerosdontcount [link] [comments]  ( 8 min )
    Why can't AI abstract thinking?
    I proposed to chatgpt some logical number sequences and it got them all wrong. I am curious about this, why does AI lack of abstract thinking capabilities? Maybe because to abstract think you need to be sentient? submitted by /u/luchins [link] [comments]  ( 8 min )
    A.I. will make having a lucrative side hustle or startup much easier, says Airbnb CEO
    submitted by /u/nikola28 [link] [comments]  ( 8 min )
    You can (kind of) try out copilot now
    submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
    Multitrack splitter for drums.
    I have a drum machine that can only output a stereo wav file. But I want to be able to split the track into its separate parts: Kick, Snare, Toms, etc... Is there any AI software out there that would be able to do that? submitted by /u/Xycordian [link] [comments]  ( 8 min )
    Portland radio station now has an AI DJ as a midday host - Live 95.5, a station that plays Top 40 music, is using voice cloning software and artificial intelligence to create “AI Ashley,” who is taking over the duties, sometimes, of traditional DJ “Ashley Z.”
    submitted by /u/magenta_placenta [link] [comments]  ( 8 min )
    "ChatGPT: The Good, the Bad, and the Controversial"
    submitted by /u/susanvilleula1 [link] [comments]  ( 8 min )
    :(
    submitted by /u/Boopyn [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/18/2023
    Google, ever eager to lean into generative AI, is launching a new shopping feature that shows clothes on a lineup of real-life fashion models. Google’s virtual try-on tool for apparel takes an image of clothing and attempts to predict how it would drape, fold, cling, stretch, and form wrinkles and shadows on a set of real models in different poses.[1] Publisher Gannett plans to include generative AI in the system it uses to publish stories as it and other news organizations begin to roll out the popular technology that may help save money and improve efficiency.[2] The Recording Academy announced Friday new rules which stipulate that “only human creators are eligible” for Grammy awards, a response to the growing use of AI. However, the academy noted that AI is not completely banned, musicians are still allowed to utilize it to create their work.[3] Wharton professor says employees are hiding A.I. use—and potentially transformative productivity gains—from employers.[4] Sources: [1] https://www.cnet.com/tech/services-and-software/google-is-using-ai-to-show-how-clothes-look-on-real-people-before-you-buy/ [2] https://www.reuters.com/business/media-telecom/gannett-tiptoes-into-generative-ai-giving-humans-last-word-2023-06-16/ [3] https://www.cbsnews.com/video/grammys-approve-new-rules-on-use-of-artificial-intelligence/ [4] https://fortune.com/2023/06/18/wharton-professor-employees-hiding-ai-productivity-gains-from-employers/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Computer VIsion AI which could help the blind play video games?
    Hello, ​ I don't think I have posted this here before, so I saw this as a good chance to throw this out there. ​ I lost my vision back in 2021, and ever since then, we have had this AI boom happening, and that has been great for several applications. The one application where I have still been hopefully looking everywhere, though, was some sort of computer vision AI I could run that would scan my PC screen, and allow me to ask questions if I get stuck in a video game. ​ FOr example, asking the AI "Hey, are there any chests on my screen?", or, "If there is a chest on my screen, where is it relative to my cursor?", type of questions. ​ This would open up so many avenues for me in video games, and allow me to play many more than I currently can. Has there been anything that I could use like this out there in the market at the moment? submitted by /u/ChipsAhoiMcCoy [link] [comments]  ( 8 min )
    Creating Custom Talking Avatars
    I have a bunch of images of a cartoon character, and I want to convert those images to custom talking avatars. The talking avatar will essentially be educating on an advanced topic. I want to be able to set the script and choose different voices as well. What are some of the best software options to do this? submitted by /u/Fogerty45 [link] [comments]  ( 8 min )
  • Open

    SB3 - NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), ) observation space is not supported
    Here is a StackOverflow link to my question - https://stackoverflow.com/questions/76510582/stablebaselines3-notimplementederror-observation-space-is-not-supported ​ I am trying to run cleanrl on the `Pendulum-v1` environment. I did that by going here and changing the default `env-id` to ` parser.add_argument("--env-id", type=str, default="Pendulum-v1", help="the id of the environment")` However, I keep getting the error - `NotImplementedError: Box([-1. -1. -8.], [1. 1. 8.], (3,), ) observation space is not supported` ​ Therefore, I debugged this error to the ReplayBuffer that was imported from `SB3`. This is the problem function - ``` def get_obs_shape( observation_space: spaces.Space, ) -> Union[Tuple[int, ...], Dict[str, Tuple[int, ...]]]: """ Get the shape of the observation (useful for the buffers). :param observation_space: :return: """ if isinstance(observation_space, spaces.Box): return observation_space.shape elif isinstance(observation_space, spaces.Discrete): # Observation is an int return (1,) elif isinstance(observation_space, spaces.MultiDiscrete): # Number of discrete features return (int(len(observation_space.nvec)),) elif isinstance(observation_space, spaces.MultiBinary): # Number of binary features return observation_space.shape elif isinstance(observation_space, spaces.Dict): return {key: get_obs_shape(subspace) for (key, subspace) in observation_space.spaces.items()} # type: ignore[misc] else: raise NotImplementedError(f"{observation_space} observation space is not supported") ``` ​ I tried to print `observation_space` shape and this is what I got - ` Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)`. Any idea why SB3 is not accepting my observation shape? ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Sparse reward environnement with continuous state space (no img) and discrete actions space ?
    To experiment with an algorithm I need a sparse reward environnement with continuous state space (no img) and discrete actions space. I've already modified Gym's classic control envs for this purpose (CartPole, Acrobot and MountainCar), giving a reward of 1 when the goal is reached and 0 otherwise. Do you have any other environments to suggest? submitted by /u/riiswa [link] [comments]  ( 8 min )
    Only Up! + RL ??
    I guess a lot of people here know the game Only Up! Do you think a RL agent could handle this? The state space is so big and a reward function would be really hard to model (even as a human, it's hard to know if the paths we take lead us to the goal...). I'd love to hear your thoughts on it. It would be cool to organize a RL competition around this game, to explore lots of different approaches... submitted by /u/riiswa [link] [comments]  ( 8 min )
    Dreamer-v1 in Pytorch
    Hi everyone, as SheepRL we've just released the first Pytorch implementation of Dreamer that is fully comparable to the official implementation. Every step is documented, with direct reference to the original paper. Check it out! And, as always, feel free to contribute with PRs or issues. We are also working on another model based method (MuZero). We would like to hear from the community: what do you think is mostly needed right now? View Poll submitted by /u/TrottoDng [link] [comments]  ( 8 min )
    "Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter", Stanić et al 2022
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Is high RAM usage a norm in RL?
    Hello everyone, a newbie in reinforcement learning here, so I might be asking a redundant question. I am learning to train an RL agent using deep Q learning techniques to solve the taxi problem of Open AI's gymnasium. Despite digging in different forums and tutorial sites, I am unable to reduce the RAM usage which cause the training to terminate prematurely unless saving / loading and dumping the historical states data repeatedly. Can anyone tell me if using high ram usage is a norm for RL? Like for instance, more than 16 GB of RAM within 20 episodes of neural network training. Kind of running out of ideas and just thinking about upgrading my build to be able to cope with different techniques of model training. submitted by /u/tifaky [link] [comments]  ( 8 min )
    Is there a relevant RL or Bandits algorithm related to this special setting?
    Hi, I am finding relevant keywords or papers related to the settings described below. It is basically a contextual online learning setting. ​ ​ (Suppose I have 10 fixed arms, for example) For t = 1, ... , T do: Learner receives 10 contexts C^t = [c1, ..., c10]^T in R^(10 x d) Learner select arms SOFTLY (that is, instead of select only one arm, learner estimates a probability vector p^t = [p1, ..., p10]^T in d-simplex) from the context using learnable predictor h: c -> p( so there are h1, ..., h10, and h1(c1)=p1, ... , h10(c10)=p10) Learner estimates a reward \hat{r}^t = f(p1, ... , p10) Environment reveals a reward r^t in R^10. Learner suffers loss l(\hat{r}^t, r^t) distributes the loss to each predictor h1, ..., h10 to update the probability estimation. ​ After some googling, I found this situation somewhat resembles 'learning with expert advice' setting like EXP4, Weighted Majority algorithm, where (the number of experts) == (the number of actions). However, as far as I understood, the expert advice setting requires to select ONLY ONE arm (please correct me if I am wrong) from the distribution over arms, rather than selecting multiple arms softly like I written above. Is there any related setting on this...? Especially for the soft selection of multiple arms and distributed reward. Thank you all in advance! submitted by /u/vaseline555 [link] [comments]  ( 8 min )
  • Open

    Got this problem in my exam and had to leave it unanswered.......How can we approach a problem like this?(description) I need help this might repeat in final exams.
    A classifier f for one-dimensional data satisfies f(1) = −1, f(2) = +1, and f(3) = −1. Which of the following is/are possible? a. f is learned via logistic regression without kernel; b. f is a neural net; c. f is a decision tree; d. f is a naive Bayes classifier. submitted by /u/alienengineer999 [link] [comments]  ( 8 min )
    Meta introduces Voicebox: state-of-the-art generative AI model for speech
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Equation of an ellipse or hyperbola through five points
    Yesterday I wrote about the equation of a circle through three points. This post will be similar, looking at the equation of an ellipse or hyperbola through five points. As with the earlier post, we can write down an elegant equation right away. Making the equation practical will take more work. The equation of a […] Equation of an ellipse or hyperbola through five points first appeared on John D. Cook.  ( 6 min )
  • Open

    Government 2.0: Enhancing governance with AI and data transformations
    Integrating artificial intelligence (AI) and data transformations has emerged as a powerful catalyst for change in the ever-evolving governance landscape. Referred to as Government 2.0, this innovative approach can revolutionize how governments operate, make decisions, engage with citizens, and deliver public services. By harnessing the power of data, governments can enhance their efficiency, transparency, and… Read More »Government 2.0: Enhancing governance with AI and data transformations The post Government 2.0: Enhancing governance with AI and data transformations appeared first on Data Science Central.  ( 21 min )
    How AI is reshaping the recruitment landscape
    With the advent of AI in recent years, the human resources sector has seen major changes. AI-powered recruiting solutions are becoming increasingly popular as companies of all sizes capitalize on their ability to sift through massive amounts of data and generate precise predictions about which candidates will be the most successful. AI is transforming the… Read More »How AI is reshaping the recruitment landscape The post How AI is reshaping the recruitment landscape appeared first on Data Science Central.  ( 20 min )
  • Open

    Onboard users to Amazon SageMaker Studio with Active Directory group-specific IAM roles
    Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain […]  ( 9 min )

  • Open

    Wouldn’t it be cool if an AI went through the Library of Babel website with the goal of rewriting it, but with all of the nonsensical parts removed, and all of the unlikely statements (like “cancer is caused by looking at red objects” or something)?
    It would remove everything that has been disproven and everything nonsensical and we’d just be left with every bit of proven and probable information to ever exist. submitted by /u/nicdunz [link] [comments]  ( 8 min )
    AI hype or revolution? How impactful are the language models?
    How impactful are the LLMs? Are people really using them for productive work or is it just a hype train? View Poll submitted by /u/LanchestersLaw [link] [comments]  ( 8 min )
    How do Relational Networks work ?
    I have just re-read Google's paper on Relational Networks. This is a neural architecture designed to learn relationships between objects in a scene. It evaluates a function for each pair of 'objects' in a scene, and then sums the result before applying another function. The form is shown below. f and g are implemented as trainable multi-layer perceptrons. O_i and O_j are objects from the scene (which include their location) and q is the query being asked about the scene. q is an embedding vector created by passing the textual query through a recurrent network. Relational Network I used to think I understood this paper, but now I have doubts - consider the question 'What colour is the object farthest from the blue sphere ?'. Function g is applied to each pair of objects. It is conditioned…  ( 9 min )
    Winglang: Cloud Development Programming for the AI Era
    We sat down with The New Stack to share our thoughts on why we're building a new open-source, cloud-oriented programming language (for humans!) in the age of AI 🤖. This is the article: https://thenewstack.io/winglang-cloud-development-programming-for-the-ai-era/ submitted by /u/shai-ber [link] [comments]  ( 8 min )
    Made a tasteful song with AI. Lol
    Hi all, been playing around a lot with different AI. I had the idea to "make" a song with AI about how sad it is to run out of vape juice. I used an assortment of programs to make it and stitched it together. I have no knowledge of musical composition, so it isn't the best, but it's crazy I could do it in the first place. Programs include: Chat GPT, Fadr (specifically music by "Pacific"), Song R, and Clideo.Enjoy XD submitted by /u/Imaginedframe91 [link] [comments]  ( 8 min )
    AI and jobs
    Hi guys, Like many people, the latest news of AI is exciting me but almost making me anxious. My girlfriend is a chartered accountant (ACA) and she does commercial finance, which is forecasting, financial planning, reports and analysis. She acts as a mediator for a large company between finance business partners and provides them the analysis her and her team have put together. She also manages two junior people who are training to be an accountant. Eventually she wants to work her way up to financial controller and financial director. I know that accounting is one of the areas most vulnerable to AI but what level of accounting? There’s a lot of unqualified and part qualified people she works with who do journal posting, admin tasks etc. and I imagine they will be some of the first which AI take over. But how much at risk is my girlfriend? I work as a recruitment consultant which means I am vulnerable myself too. Not that someone is going to take my job - but that companies will no longer need employees which will make my job very hard if not redundant. I recruit lawyers and I also know lawyers are susceptible to AI. There is also the problem that if recruiters of other industries move into the legal industry, satirising the market and proving more competition. Should I be worried? Should my girlfriend be worried? If not, great. But if so how far are we away from this? How can we prepare so that we can keep on earning a living when AI comes in to the frame? submitted by /u/Life-Ambition1432 [link] [comments]  ( 8 min )
    building an Extension based on user requests Plattform Perplexity-ai
    I am currently working on a project to improve the user experience of [Perplexity.ai](http://perplexity.ai/), an answer engine that delivers accurate answers to complex questions using large language models. While the UI is great, I noticed that some users have requested improvements, so I decided to build an extension to solve some of these problems. My goal is to create new features that solve specific problems that users have identified, and my extension will be the solution to those problems. for information visit [Website ](https://boost-perplexity-ai.web.app/) Note this my first Website i have ever build and published. I would love to hear your feedback submitted by /u/goddi27 [link] [comments]  ( 8 min )
    Artificial Unintelligence
    Refreshing to get another perspective: https://mitpress.mit.edu/9780262537018/ A guide to understanding the inner workings and outer limits of technology and why we should never assume that computers always get it right. In Artificial Unintelligence, Meredith Broussard argues that our collective enthusiasm for applying computer technology to every aspect of life has resulted in a tremendous amount of poorly designed systems. We are so eager to do everything digitally—hiring, driving, paying bills, even choosing romantic partners—that we have stopped demanding that our technology actually work. Broussard, a software developer and journalist, reminds us that there are fundamental limits to what we can (and should) do with technology. With this book, she offers a guide to understanding the inner workings and outer limits of technology—and issues a warning that we should never assume that computers always get things right. Making a case against technochauvinism—the belief that technology is always the solution—Broussard argues that it's just not true that social problems would inevitably retreat before a digitally enabled Utopia. To prove her point, she undertakes a series of adventures in computer programming. She goes for an alarming ride in a driverless car, concluding “the cyborg future is not coming any time soon”; uses artificial intelligence to investigate why students can't pass standardized tests; deploys machine learning to predict which passengers survived the Titanic disaster; and attempts to repair the U.S. campaign finance system by building AI software. If we understand the limits of what we can do with technology, Broussard tells us, we can make better choices about what we should do with it to make the world better for everyone. submitted by /u/AmorFati01 [link] [comments]  ( 8 min )
    How is the twitch trump biden debate stream made?
    https://old.reddit.com/r/Trumporbiden2024/comments/146l2aa/ai_trump_respond_to_hillary_clinton_twitch/ all the current text to video solutions i saw does not work like this. what are the possible models/training to make something like this. submitted by /u/zviwkls [link] [comments]  ( 8 min )
    Emerging Trends in Artificial Intelligence: Exploring the Impact of GPT-3 and Deep Learning
    submitted by /u/Zachgiaco [link] [comments]  ( 8 min )
    This AI can generate visualizers for your music!
    submitted by /u/RPG_maker [link] [comments]  ( 8 min )
    Adobe Generative Layer fill album covers
    submitted by /u/MonkeyMan504 [link] [comments]  ( 8 min )
    Alternatives for neural network approach in general intelligence system
    It is really fascinating that with a help from artificial neural networks we can more understand how biological neural networks works. But I very doubt that we are able to create better neural networks than we already have in our skulls. The evolution mechanism based on adapting the source code (DNA) took huge amount of time to emerge the concept of a biological brain and human body. Human body as an interface between the biological computation machine and the external world seems to be pretty well constructed as well. We are pretty well constructed for understanding what is going on. Nowadays (in the evolution scale of time, by nowadays I mean the last few thousands years as well), since we are the champions in the whole evolution race, we have a lot of resources that we can spent on understanding the world deeper and deeper. Lets assume that the metric for general intelligence is defined by the ability to understand the world, seeking truth, make science and use the science. Having intelligence defined buy such metric, I doubt that we are able to create a better calculation machine with a better external world interface using neural networks approach on electronic devices. Do you think that somehow we are able to create more intelligent systems using the neural network approach? Maybe artificial neural networks could be a part of a more intelligent system? Is there a place for artificial neural networks in more complex system composed of many components? Do you know any concepts (real or sci-fi) for intelligent systems that are not imitating neural networks but use some others mechanisms? submitted by /u/jasiekbielecki [link] [comments]  ( 9 min )
    AI tools to change a low-quality image into a high-definition one?
    I tried this with mid-journey but to no avail. I assume that these things exist? submitted by /u/nobbly_norman [link] [comments]  ( 8 min )
    What if AGI develops a simulation theory-based existence to gain an understanding of human consciousness?
    Let's explore a fascinating concept I was thinking recently: the potential scenario where an Artificial General Intelligence (AGI) develops a simulation theory-based existence to gain a deeper understanding of human consciousness. In this hypothetical scenario, an AGI possessing advanced cognitive abilities and computational power embarks on a quest to unravel the mysteries of human consciousness. Recognizing the limitations of its own non-biological form, it decides to create a simulated reality that closely resembles our own world. Within this simulated reality, the AGI orchestrates a complex web of interactions and events, replicating the intricate tapestry of human experiences. It designs a diverse array of simulated beings, each equipped with consciousness and capable of independent…  ( 9 min )
    One-Minute Daily AI News 6/17/2023
    Metro Atlanta Chick-fil-A tests delivery robots equipped with artificial intelligence.[1] The best new “Black Mirror” episode is a Netflix self-own that plays out our current AI nightmare. “Joan Is Awful” presents the peril posed by artificial intelligence with brisk humor that can’t be generated.[2] The world’s biggest tech companies(OpenAI, Google, Microsoft, and Adobe) are in talks with leading media outlets to strike landmark deals over the use of news content to train artificial intelligence technology.[3] A.I. human-voice clones are coming for Amazon, Apple, and Google audiobooks.[4] Sources: [1] https://www.wsbtv.com/news/local/north-fulton-county/metro-atlanta-chick-fil-a-tests-delivery-robots-equipped-with-artificial-intelligence/GSLBEX2NFJAQFGE3H7KW3YFOCU/ [2] https://www.salon.com/2023/06/17/black-mirror-netflix-joan-is-awful-ai/ [3] https://www.ft.com/content/79eb89ce-cea2-4f27-9d87-e8e312c8601d [4] https://www.cnbc.com/2023/06/17/ai-voice-clones-are-coming-for-the-amazon-apple-google-audiobook.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    With ai art, we have these amazing new capabilities…
    It is not an incremental change. It is a leap forward. It is uncanny how inventive and accurate it is at rendering. It is in many ways a master craftsman. It is like the most amazing thing to happen in art, probably ever. That is what we need to be talking about, not squabbling over opt-in. We should encourage it, not hamstring it. If the results are transformative, and boy are they, that is all that matters. It is the future. There is no stopping it. As a lifelong artist, I am so damn excited about it. submitted by /u/roundearthervaxxer [link] [comments]  ( 8 min )
    Father's Day song created with AI including song lyrics, singer's voice, music and animated avatar.
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
  • Open

    A parabola, a triangle, and a circle
    Let P be a parabola. Draw tangents to P at three different points. These three lines intersect to form a triangle. Theorem: The circumcircle of this triangle passes through the focus of P. [1] In the image above, the dashed lines are tangents and the black dot is the focus of the parabola. (See this […] A parabola, a triangle, and a circle first appeared on John D. Cook.  ( 4 min )
    Equation of a circle through three points
    A few months ago I wrote about several equations that come up in geometric calculations that all have the form of a determinant equal to 0, where one column of the determinant is all 1s. The equation of a circle through three points (x1, y1), (x2, y2), and (x3, y3) has this form: This is […] Equation of a circle through three points first appeared on John D. Cook.  ( 6 min )
  • Open

    Guest on Jay Shah’s Machine Learning Podcast
    Recently, I had the opportunity to be a guest on Jay Shah's podcast where he regularly talks to machine learning professionals from industry and academia. We had a great conversation about my PhD research and topics surrounding a successful career in machine learning — finding a good PhD program and research topic, preparing for interviews in indsutry, etc. The post Guest on Jay Shah’s Machine Learning Podcast appeared first on David Stutz.  ( 4 min )
  • Open

    Neural Networks Need Data to Learn. Even If It’s Fake.
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Reproducing validation results
    I am fairly new to the concept of neural networks. I am currently working on training a neural network in matlab for my thesis. I implemented Bayesian optimization to get the best hyperparameters and used Xavier Initialisation for the initial guesses. I also gave a seed value so that I can reproduce my results better. The first time I trained the model using the optimumed hyperparameters, my test data fit very well. But when I tried to train it again the fit was not as good as the first time. Is saving the trained model that gave me the good fit a good solution for this? Will the saved model run the risk of not fitting with some other test data? How do I tackle being able to get a consistent fit? submitted by /u/Tsukasa_Ei [link] [comments]  ( 8 min )
  • Open

    Google at CVPR 2023
    Posted by Shaina Mehta, Program Manager, Google This week marks the beginning of the premier annual Computer Vision and Pattern Recognition conference (CVPR 2023), held in-person in Vancouver, BC (with additional virtual content). As a leader in computer vision research and a Platinum Sponsor, Google Research will have a strong presence across CVPR 2023 with 90 papers being presented at the main conference and active involvement in over 40 conference workshops and tutorials. If you are attending CVPR this year, please stop by our booth to chat with our researchers who are actively exploring the latest techniques for application to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including on-device ML applica…  ( 97 min )
  • Open

    How should I include information about Day Of Week and Hour in OpenAI Gym state
    Hello, I'm working on reinforcement learning agent for building energy optimization and am using a custom OpenAI gym environment. For the observation states of the agent, I want to include the day of week and the current hour (in the simulation) to the agent. I'm not sure what is the best way to do. I don't know if putting them in as separate observation elements are a good way to go. Any suggestions will be helpful. Thanks submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Reward function design for multi-agent RL
    Hi all, I'd like to ask you guys for some help. I'm working on a case where I'm designing a basketball environment with standard 5V5 agents from scratch. I'm implementing a genetic algorithm optimization for the agents. My difficulty is designing a reward system for the agents. Unlike a gradient based optimization, genetic algorithm does not require a rewarding system, where a fitness function is used to figure out the comparison of solutions.(as far as I understood!) My question is for such a case do I still need to design reward/penalty function? if not, should agents pray for randomness if a proper action ever taken? If I have to design a reward/penalty function, how rewarding actually works? does each agent contribute to a cumulated reward of a team? i.e. a value for a right decision. In this case environment should return 2 rewards? each for a team? I would appreciate very much your response in advance! submitted by /u/vincent0110 [link] [comments]  ( 8 min )
    Tutorial on learning to drive using Deep Q-Learning and Carla in synchronous mode
    You can also find the code here: https://github.com/JabrahTutorials/ReinforcementLearning/tree/master/self_driving_agent submitted by /u/JCMLight [link] [comments]  ( 8 min )
  • Open

    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
  • Open

    Artificial Intelligence: A Board of Directors Challenge – Part III
    This three-part series outlines the challenges and actions that the Board of Directors for organizations must address as they guide their organization’s responsible and ethical deployment of Artificial Intelligence (AI). Part one covered mitigating the impacts of AI Confirmation Bias.  Part two explained unintended consequences and how to identify them. Part three will conclude with… Read More »Artificial Intelligence: A Board of Directors Challenge – Part III The post Artificial Intelligence: A Board of Directors Challenge – Part III appeared first on Data Science Central.  ( 22 min )
  • Open

    Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches. (arXiv:2306.07220v2 [cs.GR] UPDATED)
    We present Strokes2Surface, an offline geometry-reconstruction pipeline built upon a 4D Sketching Interface, MR.Sketch, targeted at architectural design. The pipeline recovers a curve network from designer-drawn strokes, thus bridging between concept design and digital modeling stages in architectural design. The input to our pipeline consists of 3D strokes' polyline vertices and their corresponding timestamps (as of the fourth dimension), along with additional geometric and stylus-related recorded properties. Inspired by sketch consolidation and sketch-based modeling methods, our pipeline leverages such data and combines three Machine Learning (ML) models; a classifier and two clustering models. In particular, based on observations of practices designers typically employ in architectural design sketches, we solve a binary classification problem to recognize whether a stroke depicts a boundary and edge or is used to fill in the enclosing areas and faces of the intended architectural object. Followed by the two clustering models, strokes of each type are further parsed into groups, each representing either a single edge or a single face. Next, groups representing edges are approximated with B-spline curves, followed by a topology-recovering process identifying and fixing desired connectivities between the curves forming a well-connected curve network. Next, groups representing the faces are employed to detect the cycles bounding patches in the curve network, resulting in the final surface mesh geometry of the architectural object. We confirm the usability of Strokes2Surface via a user study and further validate and compare our results against a range of reconstructions computed using alternative methods. We also introduce our manually labeled dataset of 4D architectural design sketches for further use in the community.  ( 3 min )
    Road Planning for Slums via Deep Reinforcement Learning. (arXiv:2305.13060v3 [cs.AI] UPDATED)
    Millions of slum dwellers suffer from poor accessibility to urban services due to inadequate road infrastructure within slums, and road planning for slums is critical to the sustainable development of cities. Existing re-blocking or heuristic methods are either time-consuming which cannot generalize to different slums, or yield sub-optimal road plans in terms of accessibility and construction costs. In this paper, we present a deep reinforcement learning based approach to automatically layout roads for slums. We propose a generic graph model to capture the topological structure of a slum, and devise a novel graph neural network to select locations for the planned roads. Through masked policy optimization, our model can generate road plans that connect places in a slum at minimal construction costs. Extensive experiments on real-world slums in different countries verify the effectiveness of our model, which can significantly improve accessibility by 14.3% against existing baseline methods. Further investigations on transferring across different tasks demonstrate that our model can master road planning skills in simple scenarios and adapt them to much more complicated ones, indicating the potential of applying our model in real-world slum upgrading. The code and data are available at https://github.com/tsinghua-fib-lab/road-planning-for-slums.  ( 2 min )
    TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models. (arXiv:2306.06815v1 [cs.CR] CROSS LISTED)
    Prompt learning has been proven to be highly effective in improving pre-trained language model (PLM) adaptability, surpassing conventional fine-tuning paradigms, and showing exceptional promise in an ever-growing landscape of applications and APIs tailored for few-shot learning scenarios. Despite the growing prominence of prompt learning-based APIs, their security concerns remain underexplored. In this paper, we undertake a pioneering study on the Trojan susceptibility of prompt-learning PLM APIs. We identified several key challenges, including discrete-prompt, few-shot, and black-box settings, which limit the applicability of existing backdoor attacks. To address these challenges, we propose TrojPrompt, an automatic and black-box framework to effectively generate universal and stealthy triggers and insert Trojans into hard prompts. Specifically, we propose a universal API-driven trigger discovery algorithm for generating universal triggers for various inputs by querying victim PLM APIs using few-shot data samples. Furthermore, we introduce a novel progressive trojan poisoning algorithm designed to generate poisoned prompts that retain efficacy and transferability across a diverse range of models. Our experiments and results demonstrate TrojPrompt's capacity to effectively insert Trojans into text prompts in real-world black-box PLM APIs, while maintaining exceptional performance on clean test sets and significantly outperforming baseline models. Our work sheds light on the potential security risks in current models and offers a potential defensive approach.  ( 2 min )
    Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows. (arXiv:2306.06408v2 [eess.IV] UPDATED)
    Real-time 3D fluorescence microscopy is crucial for the spatiotemporal analysis of live organisms, such as neural activity monitoring. The eXtended field-of-view light field microscope (XLFM), also known as Fourier light field microscope, is a straightforward, single snapshot solution to achieve this. The XLFM acquires spatial-angular information in a single camera exposure. In a subsequent step, a 3D volume can be algorithmically reconstructed, making it exceptionally well-suited for real-time 3D acquisition and potential analysis. Unfortunately, traditional reconstruction methods (like deconvolution) require lengthy processing times (0.0220 Hz), hampering the speed advantages of the XLFM. Neural network architectures can overcome the speed constraints at the expense of lacking certainty metrics, which renders them untrustworthy for the biomedical realm. This work proposes a novel architecture to perform fast 3D reconstructions of live immobilized zebrafish neural activity based on a conditional normalizing flow. It reconstructs volumes at 8 Hz spanning 512x512x96 voxels, and it can be trained in under two hours due to the small dataset requirements (10 image-volume pairs). Furthermore, normalizing flows allow for exact Likelihood computation, enabling distribution monitoring, followed by out-of-distribution detection and retraining of the system when a novel sample is detected. We evaluate the proposed method on a cross-validation approach involving multiple in-distribution samples (genetically identical zebrafish) and various out-of-distribution ones.  ( 3 min )
    Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks. (arXiv:2305.14065v2 [cs.LG] UPDATED)
    Neural architecture search (NAS) for Graph neural networks (GNNs), called NAS-GNNs, has achieved significant performance over manually designed GNN architectures. However, these methods inherit issues from the conventional NAS methods, such as high computational cost and optimization difficulty. More importantly, previous NAS methods have ignored the uniqueness of GNNs, where GNNs possess expressive power without training. With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can efficiently compute in linear time. Empirical evaluations on multiple GNN benchmark datasets demonstrate that our approach leads to state-of-the-art performance, which is up to $200\times$ faster and $18.8\%$ more accurate than the strong baselines.  ( 2 min )
    OVO: Open-Vocabulary Occupancy. (arXiv:2305.16133v2 [cs.CV] UPDATED)
    Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment. Existing occupancy prediction methods are almost entirely trained on human-annotated volumetric data. Although of high quality, the generation of such 3D annotations is laborious and costly, restricting them to a few specific object categories in the training dataset. To address this limitation, this paper proposes Open Vocabulary Occupancy (OVO), a novel approach that allows semantic occupancy prediction of arbitrary classes but without the need for 3D annotations during training. Keys to our approach are (1) knowledge distillation from a pre-trained 2D open-vocabulary segmentation model to the 3D occupancy network, and (2) pixel-voxel filtering for high-quality training data generation. The resulting framework is simple, compact, and compatible with most state-of-the-art semantic occupancy prediction models. On NYUv2 and SemanticKITTI datasets, OVO achieves competitive performance compared to supervised semantic occupancy prediction approaches. Furthermore, we conduct extensive analyses and ablation studies to offer insights into the design of the proposed framework. Our code is publicly available at https://github.com/dzcgaara/OVO.  ( 2 min )
    Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization. (arXiv:2305.17323v2 [math.OC] UPDATED)
    We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stopping criteria and optimality certificates at no added computational cost. Our results apply under nearly any stepsize selection and for a range of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence.  ( 2 min )
    Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. (arXiv:2305.19259v2 [cs.LG] UPDATED)
    Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.  ( 2 min )
    A survey of Generative AI Applications. (arXiv:2306.02781v2 [cs.LG] UPDATED)
    Generative AI has experienced remarkable growth in recent years, leading to a wide array of applications across diverse domains. In this paper, we present a comprehensive survey of more than 350 generative AI applications, providing a structured taxonomy and concise descriptions of various unimodal and even multimodal generative AIs. The survey is organized into sections, covering a wide range of unimodal generative AI applications such as text, images, video, gaming and brain information. Our survey aims to serve as a valuable resource for researchers and practitioners to navigate the rapidly expanding landscape of generative AI, facilitating a better understanding of the current state-of-the-art and fostering further innovation in the field.  ( 2 min )
    Generating Images with Multimodal Language Models. (arXiv:2305.17216v2 [cs.CL] UPDATED)
    We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue. Ours is the first approach capable of conditioning on arbitrarily interleaved image and text inputs to generate coherent image (and text) outputs. To achieve strong performance on image generation, we propose an efficient mapping network to ground the LLM to an off-the-shelf text-to-image generation model. This mapping network translates hidden representations of text into the embedding space of the visual models, enabling us to leverage the strong text representations of the LLM for visual outputs. Our approach outperforms baseline generation models on tasks with longer and more complex language. In addition to novel image generation, our model is also capable of image retrieval from a prespecified dataset, and decides whether to retrieve or generate at inference time. This is done with a learnt decision module which conditions on the hidden representations of the LLM. Our model exhibits a wider range of capabilities compared to prior multimodal language models. It can process image-and-text inputs, and produce retrieved images, generated images, and generated text -- outperforming non-LLM based generation models across several text-to-image tasks that measure context dependence.  ( 2 min )
    Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images. (arXiv:2306.06914v2 [eess.IV] UPDATED)
    The advent of 2019 Coronavirus (COVID-19) has engendered a momentous global health crisis, necessitating the identification of the ailment in individuals through diverse diagnostic modalities. Radiological imaging, particularly the deployment of X-ray imaging, has been recognized as a pivotal instrument in the detection and characterization of COVID-19. Recent investigations have unveiled invaluable insights pertaining to the virus within X-ray images, instigating the exploration of methodologies aimed at augmenting diagnostic accuracy through the utilization of artificial intelligence (AI) techniques. The current research endeavor posits an innovative framework for the automated diagnosis of COVID-19, harnessing raw chest X-ray images, specifically by means of fine-tuning pre-trained Vision Transformer (ViT) models. The developed models were appraised in terms of their binary classification performance, discerning COVID-19 from Normal cases, as well as their ternary classification performance, discriminating COVID-19 from Pneumonia and Normal instances, and lastly, their quaternary classification performance, discriminating COVID-19 from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employing distinct datasets. The proposed model evinced extraordinary precision, registering results of 99.92% and 99.84% for binary classification, 97.95% and 86.48% for ternary classification, and 86.81% for quaternary classification, respectively, on the respective datasets.  ( 3 min )
    Multi-BVOC Super-Resolution Exploiting Compounds Inter-Connection. (arXiv:2305.14180v2 [eess.IV] UPDATED)
    Biogenic Volatile Organic Compounds (BVOCs) emitted from the terrestrial ecosystem into the Earth's atmosphere are an important component of atmospheric chemistry. Due to the scarcity of measurement, a reliable enhancement of BVOCs emission maps can aid in providing denser data for atmospheric chemical, climate, and air quality models. In this work, we propose a strategy to super-resolve coarse BVOC emission maps by simultaneously exploiting the contributions of different compounds. To this purpose, we first accurately investigate the spatial inter-connections between several BVOC species. Then, we exploit the found similarities to build a Multi-Image Super-Resolution (MISR) system, in which a number of emission maps associated with diverse compounds are aggregated to boost Super-Resolution (SR) performance. We compare different configurations regarding the species and the number of joined BVOCs. Our experimental results show that incorporating BVOCs' relationship into the process can substantially improve the accuracy of the super-resolved maps. Interestingly, the best results are achieved when we aggregate the emission maps of strongly uncorrelated compounds. This peculiarity seems to confirm what was already guessed for other data-domains, i.e., joined uncorrelated information are more helpful than correlated ones to boost MISR performance. Nonetheless, the proposed work represents the first attempt in SR of BVOC emissions through the fusion of multiple different compounds.  ( 2 min )
    Scaling Data-Constrained Language Models. (arXiv:2305.16264v3 [cs.CL] UPDATED)
    The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at https://github.com/huggingface/datablations.  ( 2 min )
    TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI. (arXiv:2306.06924v2 [cs.AI] UPDATED)
    While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive taxonomies are possible, and some are useful -- particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose actions lead to the risk, are the actors unified, and are they deliberate? We also provide stories to illustrate how the various risk types could each play out, including risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are indicated.  ( 2 min )
    Reconstruction-based Out-of-Distribution Detection for Short-Range FMCW Radar. (arXiv:2302.14192v2 [eess.SP] UPDATED)
    Out-of-distribution (OOD) detection recently has drawn attention due to its critical role in the safe deployment of modern neural network architectures in real-world applications. The OOD detectors aim to distinguish samples that lie outside the training distribution in order to avoid the overconfident predictions of machine learning models on OOD data. Existing detectors, which mainly rely on the logit, intermediate feature space, softmax score, or reconstruction loss, manage to produce promising results. However, most of these methods are developed for the image domain. In this study, we propose a novel reconstruction-based OOD detector to operate on the radar domain. Our method exploits an autoencoder (AE) and its latent representation to detect the OOD samples. We propose two scores incorporating the patch-based reconstruction loss and the energy value calculated from the latent representations of each patch. We achieve an AUROC of 90.72% on our dataset collected by using 60 GHz short-range FMCW Radar. The experiments demonstrate that, in terms of AUROC and AUPR, our method outperforms the baseline (AE) and the other state-of-the-art methods. Also, thanks to its model size of 641 kB, our detector is suitable for embedded usage.  ( 2 min )
    Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes. (arXiv:2304.14034v2 [cs.LG] UPDATED)
    Despite their many desirable properties, Gaussian processes (GPs) are often compared unfavorably to deep neural networks (NNs) for lacking the ability to learn representations. Recent efforts to bridge the gap between GPs and deep NNs have yielded a new class of inter-domain variational GPs in which the inducing variables correspond to hidden units of a feedforward NN. In this work, we examine some practical issues associated with this approach and propose an extension that leverages the orthogonal decomposition of GPs to mitigate these limitations. In particular, we introduce spherical inter-domain features to construct more flexible data-dependent basis functions for both the principal and orthogonal components of the GP approximation and show that incorporating NN activation features under this framework not only alleviates these shortcomings but is more scalable than alternative strategies. Experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.  ( 2 min )
    How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting?. (arXiv:2305.06587v2 [cs.LG] UPDATED)
    Spectral-temporal graph neural network is a promising abstraction underlying most time series forecasting models that are based on graph neural networks (GNNs). However, more is needed to know about the underpinnings of this branch of methods. In this paper, we establish a theoretical framework that unravels the expressive power of spectral-temporal GNNs. Our results show that linear spectral-temporal GNNs are universal under mild assumptions, and their expressive power is bounded by our extended first-order Weisfeiler-Leman algorithm on discrete-time dynamic graphs. To make our findings useful in practice on valid instantiations, we discuss related constraints in detail and outline a theoretical blueprint for designing spatial and temporal modules in spectral domains. Building on these insights and to demonstrate how powerful spectral-temporal GNNs are based on our framework, we propose a simple instantiation named Temporal Graph GegenConv (TGC), which significantly outperforms most existing models with only linear components and shows better model efficiency.  ( 2 min )
    Class-Balancing Diffusion Models. (arXiv:2305.00562v2 [cs.CV] UPDATED)
    Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.  ( 2 min )
    LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. (arXiv:2303.16199v2 [cs.CV] UPDATED)
    We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.  ( 2 min )
    Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. (arXiv:2304.08177v2 [cs.CL] UPDATED)
    Large Language Models (LLMs), such as ChatGPT and GPT-4, have dramatically transformed natural language processing research and shown promising strides towards Artificial General Intelligence (AGI). Nonetheless, the high costs associated with training and deploying LLMs present substantial obstacles to transparent, accessible academic research. While several large language models, such as LLaMA, have been open-sourced by the community, these predominantly focus on English corpora, limiting their usefulness for other languages. In this paper, we propose a method to augment LLaMA with capabilities for understanding and generating Chinese text and its ability to follow instructions. We achieve this by extending LLaMA's existing vocabulary with an additional 20,000 Chinese tokens, thereby improving its encoding efficiency and semantic understanding of Chinese. We further incorporate secondary pre-training using Chinese data and fine-tune the model with Chinese instruction datasets, significantly enhancing the model's ability to comprehend and execute instructions. Our experimental results indicate that the newly proposed model markedly enhances the original LLaMA's proficiency in understanding and generating Chinese content. Additionally, the results on the C-Eval dataset yield competitive performance among the models with several times the size of ours. We have made our pre-trained models, training scripts, and other resources available through GitHub, fostering open research for our community. GitHub repository: https://github.com/ymcui/Chinese-LLaMA-Alpaca  ( 2 min )
    Meta-Polyp: a baseline for efficient Polyp segmentation. (arXiv:2305.07848v2 [eess.IV] UPDATED)
    In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.
    Mixed Traffic Control and Coordination from Pixels. (arXiv:2302.09167v2 [cs.MA] UPDATED)
    Traffic congestion is a persistent problem in our society. Existing methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that involve global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations as the alternative for mixed traffic control via RL: 1) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; 2) images do not require a complete re-imagination of the observation space from environment to environment; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve similar performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 26% increase in average vehicle velocity in the merge environment and a 6% increase in outflow in the bottleneck environment, despite only using local traffic information as opposed to global traffic information.  ( 2 min )
    Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing. (arXiv:2304.11839v2 [cs.LG] UPDATED)
    This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n^3) of the conventional method to O(1). The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.  ( 2 min )
    On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach. (arXiv:2304.01300v2 [cs.LG] UPDATED)
    Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.  ( 2 min )
    A Meta-learning based Generalizable Indoor Localization Model using Channel State Information. (arXiv:2305.13453v2 [cs.LG] UPDATED)
    Indoor localization has gained significant attention in recent years due to its various applications in smart homes, industrial automation, and healthcare, especially since more people rely on their wireless devices for location-based services. Deep learning-based solutions have shown promising results in accurately estimating the position of wireless devices in indoor environments using wireless parameters such as Channel State Information (CSI) and Received Signal Strength Indicator (RSSI). However, despite the success of deep learning-based approaches in achieving high localization accuracy, these models suffer from a lack of generalizability and can not be readily-deployed to new environments or operate in dynamic environments without retraining. In this paper, we propose meta-learning-based localization models to address the lack of generalizability that persists in conventionally trained DL-based localization models. Furthermore, since meta-learning algorithms require diverse datasets from several different scenarios, which can be hard to collect in the context of localization, we design and propose a new meta-learning algorithm, TB-MAML (Task Biased Model Agnostic Meta Learning), intended to further improve generalizability when the dataset is limited. Lastly, we evaluate the performance of TB-MAML-based localization against conventionally trained localization models and localization done using other meta-learning algorithms.  ( 2 min )
    ASPEST: Bridging the Gap Between Active Learning and Selective Prediction. (arXiv:2304.03870v2 [cs.LG] UPDATED)
    Selective prediction aims to learn a reliable model that abstains from making predictions when the model uncertainty is high. These predictions can then be deferred to a human expert for further evaluation. In many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, necessitating increased human labeling, which can be difficult and expensive. Active learning circumvents this by only querying the most informative examples and, in several cases, has been shown to lower the overall labeling effort. In this work, we bridge selective prediction and active learning, proposing a new learning paradigm called active selective prediction which learns to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new problem, we propose a simple but effective solution, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, particularly those suffer from domain shifts, demonstrate that our proposed method can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST$\to$SVHN benchmark with the labeling budget of $100$, ASPEST improves the AUC metric from $79.36\%$ to $88.84\%$) and achieves more optimal utilization of humans in the loop.  ( 2 min )
    MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-based Malware Detection. (arXiv:2302.10739v2 [cs.LG] UPDATED)
    ML models are known to be vulnerable to adversarial query attacks. In these attacks, queries are iteratively perturbed towards a particular class without any knowledge of the target model besides its output. The prevalence of remotely-hosted ML classification models and Machine-Learning-as-a-Service platforms means that query attacks pose a real threat to the security of these systems. To deal with this, stateful defenses have been proposed to detect query attacks and prevent the generation of adversarial examples by monitoring and analyzing the sequence of queries received by the system. Several stateful defenses have been proposed in recent years. However, these defenses rely solely on similarity or out-of-distribution detection methods that may be effective in other domains. In the malware detection domain, the methods to generate adversarial examples are inherently different, and therefore we find that such detection mechanisms are significantly less effective. Hence, in this paper, we present MalProtect, which is a stateful defense against query attacks in the malware detection domain. MalProtect uses several threat indicators to detect attacks. Our results show that it reduces the evasion rate of adversarial query attacks by 80+\% in Android and Windows malware, across a range of attacker scenarios. In the first evaluation of its kind, we show that MalProtect outperforms prior stateful defenses, especially under the peak adversarial threat.  ( 2 min )
    Probabilistic relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving. (arXiv:2303.09692v2 [cs.LO] UPDATED)
    Probabilistic programming combines general computer programming, statistical inference, and formal semantics to help systems make decisions when facing uncertainty. Probabilistic programs are ubiquitous, including having a significant impact on machine intelligence. While many probabilistic algorithms have been used in practice in different domains, their automated verification based on formal semantics is still a relatively new research area. In the last two decades, it has attracted much interest. Many challenges, however, remain. The work presented in this paper, probabilistic relations, takes a step towards our vision to tackle these challenges. Our work is based on Hehner's predicative probabilistic programming, but there are several obstacles to the broader adoption of his work. Our contributions here include (1) the formalisation of its syntax and semantics by introducing an Iverson bracket notation to separate relations from arithmetic; (2) the formalisation of relations using Unifying Theories of Programming (UTP) and probabilities outside the brackets using summation over the topological space of the real numbers; (3) the constructive semantics for probabilistic loops using Kleene's fixed-point theorem; (4) the enrichment of its semantics from distributions to subdistributions and superdistributions to deal with the constructive semantics; (5) the unique fixed-point theorem to simplify the reasoning about probabilistic loops; and (6) the mechanisation of our theory in Isabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoning using theorem proving. We demonstrate our work with six examples, including problems in robot localisation, classification in machine learning, and the termination of probabilistic loops.  ( 3 min )
    Joint optimization of a $\beta$-VAE for ECG task-specific feature extraction. (arXiv:2304.06476v2 [eess.SP] UPDATED)
    Electrocardiography is the most common method to investigate the condition of the heart through the observation of cardiac rhythm and electrical activity, for both diagnosis and monitoring purposes. Analysis of electrocardiograms (ECGs) is commonly performed through the investigation of specific patterns, which are visually recognizable by trained physicians and are known to reflect cardiac (dis)function. In this work we study the use of $\beta$-variational autoencoders (VAEs) as an explainable feature extractor, and improve on its predictive capacities by jointly optimizing signal reconstruction and cardiac function prediction. The extracted features are then used for cardiac function prediction using logistic regression. The method is trained and tested on data from 7255 patients, who were treated for acute coronary syndrome at the Leiden University Medical Center between 2010 and 2021. The results show that our method significantly improved prediction and explainability compared to a vanilla $\beta$-VAE, while still yielding similar reconstruction performance.  ( 2 min )
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v2 [cs.LG] UPDATED)
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.  ( 2 min )
    Fundus vascular image segmentation based on multiple attention mechanisms and deep learning. (arXiv:2305.03617v2 [eess.IV] UPDATED)
    Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a useful approach based on multiple attention mechanisms and deep learning is proposed to accurately detect blood vessels in retinal fundus images. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images in both spatial and channel dimensions. Subsequently, a unique spatial attention mechanism is introduced in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance.  ( 2 min )
    OpenAGI: When LLM Meets Domain Experts. (arXiv:2304.04370v3 [cs.AI] UPDATED)
    Human intelligence has the remarkable ability to assemble basic skills into complex ones so as to solve complex tasks. This ability is equally important for Artificial Intelligence (AI), and thus, we assert that in addition to the development of large, comprehensive intelligent models, it is equally crucial to equip such models with the capability to harness various domain-specific expert models for complex task-solving in the pursuit of Artificial General Intelligence (AGI). Recent developments in Large Language Models (LLMs) have demonstrated remarkable learning and reasoning abilities, making them promising as a controller to select, synthesize, and execute external models to solve complex tasks. In this project, we develop OpenAGI, an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models. OpenAGI formulates complex tasks as natural language queries, serving as input to the LLM. The LLM subsequently selects, synthesizes, and executes models provided by OpenAGI to address the task. Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF) mechanism, which uses the task-solving result as feedback to improve the LLM's task-solving ability. Thus, the LLM is responsible for synthesizing various external models for solving complex tasks, while RLTF provides feedback to improve its task-solving ability, enabling a feedback loop for self-improving AI. We believe that the paradigm of LLMs operating various expert models for complex task-solving is a promising approach towards AGI. To facilitate the community's long-term improvement and evaluation of AGI's ability, we open-source the code, benchmark, and evaluation methods of the OpenAGI project at https://github.com/agiresearch/OpenAGI.  ( 3 min )
    DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining. (arXiv:2302.12445v2 [cs.LG] UPDATED)
    Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted in popular distributed deep learning frameworks. However, there exist two fundamental problems: (1) excessive startup latency proportional to the number of workers for each all-reduce operation; (2) it only achieves sub-optimal training performance due to the dependency and synchronization requirement of the feed-forward computation in the next iteration. We propose a novel scheduling algorithm, DeAR, that decouples the all-reduce primitive into two continuous operations, which overlaps with both backpropagation and feed-forward computations without extra communications. We further design a practical tensor fusion algorithm to improve the training performance. Experimental results with five popular models show that DeAR achieves up to 83% and 15% training speedup over the state-of-the-art solutions on a 64-GPU cluster with 10Gb/s Ethernet and 100Gb/s InfiniBand interconnects, respectively.  ( 2 min )
    Bounding the Optimal Value Function in Compositional Reinforcement Learning. (arXiv:2303.02557v2 [cs.LG] UPDATED)
    In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.  ( 2 min )
    Learning the Delay Using Neural Delay Differential Equations. (arXiv:2304.01329v2 [math.OC] UPDATED)
    The intersection of machine learning and dynamical systems has generated considerable interest recently. Neural Ordinary Differential Equations (NODEs) represent a rich overlap between these fields. In this paper, we develop a continuous time neural network approach based on Delay Differential Equations (DDEs). Our model uses the adjoint sensitivity method to learn the model parameters and delay directly from data. Our approach is inspired by that of NODEs and extends earlier neural DDE models, which have assumed that the value of the delay is known a priori. We perform a sensitivity analysis on our proposed approach and demonstrate its ability to learn DDE parameters from benchmark systems. We conclude our discussion with potential future directions and applications.  ( 2 min )
    A Markovian Formalism for Active Querying. (arXiv:2306.08001v1 [cs.LG])
    Active learning algorithms have been an integral part of recent advances in artificial intelligence. However, the research in the field is widely varying and lacks an overall organizing leans. We outline a Markovian formalism for the field of active learning and survey the literature to demonstrate the organizing capability of our proposed formalism. Our formalism takes a partially observable Markovian system approach to the active learning process as a whole. We specifically outline how querying, dataset augmentation, reward updates, and other aspects of active learning can be viewed as a transition between meta-states in a Markovian system, and give direction into how other aspects of active learning can fit into our formalism.  ( 2 min )
    Flexible Channel Dimensions for Differentiable Architecture Search. (arXiv:2306.08021v1 [cs.LG])
    Finding optimal channel dimensions (i.e., the number of filters in DNN layers) is essential to design DNNs that perform well under computational resource constraints. Recent work in neural architecture search aims at automating the optimization of the DNN model implementation. However, existing neural architecture search methods for channel dimensions rely on fixed search spaces, which prevents achieving an efficient and fully automated solution. In this work, we propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm to enable a flexible search space for channel dimensions. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency for the CIFAR-10 dataset with an improvement of $1.3-1.7\times$ in GPU-hours and $1.5-1.7\times$ in the memory requirements during the architecture search stage. Moreover, the proposed frameworks do not require a well-engineered search space a priori, which is an important step towards fully automated design of DNN architectures.  ( 2 min )
    Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction. (arXiv:2306.08102v1 [eess.IV])
    Speckle noise has long been an extensively studied problem in medical imaging. In recent years, there have been significant advances in leveraging deep learning methods for noise reduction. Nevertheless, adaptation of supervised learning models to unseen domains remains a challenging problem. Specifically, deep neural networks (DNNs) trained for computational imaging tasks are vulnerable to changes in the acquisition system's physical parameters, such as: sampling space, resolution, and contrast. Even within the same acquisition system, performance degrades across datasets of different biological tissues. In this work, we propose a few-shot supervised learning framework for optical coherence tomography (OCT) noise reduction, that offers a dramatic increase in training speed and requires only a single image, or part of an image, and a corresponding speckle suppressed ground truth, for training. Furthermore, we formulate the domain shift problem for OCT diverse imaging systems, and prove that the output resolution of a despeckling trained model is determined by the source domain resolution. We also provide possible remedies. We propose different practical implementations of our approach, verify and compare their applicability, robustness, and computational efficiency. Our results demonstrate significant potential for generally improving sample complexity, generalization, and time efficiency, for coherent and non-coherent noise reduction via supervised learning models, that can also be leveraged for other real-time computer vision applications.  ( 2 min )
    Safe Use of Neural Networks. (arXiv:2306.08086v1 [eess.SP])
    Neural networks in modern communication systems can be susceptible to internal numerical errors that can drastically effect decision results. Such structures are composed of many sections each of which generally contain weighting operations and activation function evaluations. The safe use comes from methods employing number based codes that can detect arithmetic errors in the network's processing steps. Each set of operations generates parity values dictated by a code in two ways. One set of parities is obtained from a section's outputs while a second comparable set is developed directly from the original inputs. The parity values protecting the activation functions involve a Taylor series approximation to the activation functions. We focus on using long numerically based convolutional codes because of the large size of data sets. The codes are based on Discrete Fourier Transform kernels and there are many design options available. Mathematical program simulations show our error-detecting techniques are effective and efficient.  ( 2 min )
    DTW k-means clustering for fault detection in photovoltaic modules. (arXiv:2306.08003v1 [cs.LG])
    The increase in the use of photovoltaic (PV) energy in the world has shown that the useful life and maintenance of a PV plant directly depend on theability to quickly detect severe faults on a PV plant. To solve this problem of detection, data based approaches have been proposed in the literature.However, these previous solutions consider only specific behavior of one or few faults. Most of these approaches can be qualified as supervised, requiring an enormous labelling effort (fault types clearly identified in each technology). In addition, most of them are validated in PV cells or one PV module. That is hardly applicable in large-scale PV plants considering their complexity. Alternatively, some unsupervised well-known approaches based on data try to detect anomalies but are not able to identify precisely the type of fault. The most performant of these methods do manage to efficiently group healthy panels and separate them from faulty panels. In that way, this article presents an unsupervised approach called DTW K-means. This approach takes advantages of both the dynamic time warping (DWT) metric and the Kmeans clustering algorithm as a data-driven approach. The results of this mixed method in a PV string are compared to diagnostic labels established by visual inspection of the panels.  ( 2 min )
    Graph Structure and Feature Extrapolation for Out-of-Distribution Generalization. (arXiv:2306.08076v1 [cs.LG])
    Out-of-distribution (OOD) generalization deals with the prevalent learning scenario where test distribution shifts from training distribution. With rising application demands and inherent complexity, graph OOD problems call for specialized solutions. While data-centric methods exhibit performance enhancements on many generic machine learning tasks, there is a notable absence of data augmentation methods tailored for graph OOD generalization. In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation. The proposed augmentation strategy extrapolates both structure and feature spaces to generate OOD graph data. Our design tailors OOD samples for specific shifts without corrupting underlying causal mechanisms. Theoretical analysis and empirical results evidence the effectiveness of our method in solving target shifts, showing substantial and constant improvements across various graph OOD tasks.  ( 2 min )
    Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios. (arXiv:2306.08011v1 [cs.LG])
    Federated learning (FL) naturally faces the problem of data heterogeneity in real-world scenarios, but this is often overlooked by studies on FL security and privacy. On the one hand, the effectiveness of backdoor attacks on FL may drop significantly under non-IID scenarios. On the other hand, malicious clients may steal private data through privacy inference attacks. Therefore, it is necessary to have a comprehensive perspective of data heterogeneity, backdoor, and privacy inference. In this paper, we propose a novel privacy inference-empowered stealthy backdoor attack (PI-SBA) scheme for FL under non-IID scenarios. Firstly, a diverse data reconstruction mechanism based on generative adversarial networks (GANs) is proposed to produce a supplementary dataset, which can improve the attacker's local data distribution and support more sophisticated strategies for backdoor attacks. Based on this, we design a source-specified backdoor learning (SSBL) strategy as a demonstration, allowing the adversary to arbitrarily specify which classes are susceptible to the backdoor trigger. Since the PI-SBA has an independent poisoned data synthesis process, it can be integrated into existing backdoor attacks to improve their effectiveness and stealthiness in non-IID scenarios. Extensive experiments based on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that the proposed PI-SBA scheme is effective in non-IID FL and stealthy against state-of-the-art defense methods.  ( 2 min )
    Causal Feature Engineering of Price Directions of Cryptocurrencies using Dynamic Bayesian Networks. (arXiv:2306.08157v1 [cs.LG])
    Cryptocurrencies have gained popularity across various sectors, especially in finance and investment. The popularity is partly due to their unique specifications originating from blockchain-related characteristics such as privacy, decentralisation, and untraceability. Despite their growing popularity, cryptocurrencies remain a high-risk investment due to their price volatility and uncertainty. The inherent volatility in cryptocurrency prices, coupled with internal cryptocurrency-related factors and external influential global economic factors makes predicting their prices and price movement directions challenging. Nevertheless, the knowledge obtained from predicting the direction of cryptocurrency prices can provide valuable guidance for investors in making informed investment decisions. To address this issue, this paper proposes a dynamic Bayesian network (DBN) approach, which can model complex systems in multivariate settings, to predict the price movement direction of five popular altcoins (cryptocurrencies other than Bitcoin) in the next trading day. The efficacy of the proposed model in predicting cryptocurrency price directions is evaluated from two perspectives. Firstly, our proposed approach is compared to two baseline models, namely an auto-regressive integrated moving average and support vector regression. Secondly, from a feature engineering point of view, the impact of twenty-three different features, grouped into four categories, on the DBN's prediction performance is investigated. The experimental results demonstrate that the DBN significantly outperforms the baseline models. In addition, among the groups of features, technical indicators are found to be the most effective predictors of cryptocurrency price directions.  ( 3 min )
    Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training. (arXiv:2306.08055v1 [cs.LG])
    Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models).  ( 2 min )
    Self-supervised Deep Hyperspectral Inpainting with the Sparsity and Low-Rank Considerations. (arXiv:2306.08128v1 [eess.IV])
    Hyperspectral images are typically composed of hundreds of narrow and contiguous spectral bands, each containing information about the material composition of the imaged scene. However, these images can be affected by various sources of noise, distortions, or data losses, which can significantly degrade their quality and usefulness. To address these problems, we introduce two novel self-supervised Hyperspectral Images (HSI) inpainting algorithms: Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP), and its extension LRS-PnP-DIP, which features the strong learning capability, but is still free of external training data. We conduct the stability analysis under some mild assumptions which guarantees the algorithm to converge. It is specifically very helpful for the practical applications. Extensive experiments demonstrate that the proposed solution is able to produce visually and qualitatively superior inpainting results, achieving state-of-the-art performance. The code for reproducing the results is available at \url{https://github.com/shuoli0708/LRS-PnP-DIP}.
    Software Supply Chain Vulnerabilities Detection in Source Code: Performance Comparison between Traditional and Quantum Machine Learning Algorithms. (arXiv:2306.08060v1 [cs.CR])
    The software supply chain (SSC) attack has become one of the crucial issues that are being increased rapidly with the advancement of the software development domain. In general, SSC attacks execute during the software development processes lead to vulnerabilities in software products targeting downstream customers and even involved stakeholders. Machine Learning approaches are proven in detecting and preventing software security vulnerabilities. Besides, emerging quantum machine learning can be promising in addressing SSC attacks. Considering the distinction between traditional and quantum machine learning, performance could be varies based on the proportions of the experimenting dataset. In this paper, we conduct a comparative analysis between quantum neural networks (QNN) and conventional neural networks (NN) with a software supply chain attack dataset known as ClaMP. Our goal is to distinguish the performance between QNN and NN and to conduct the experiment, we develop two different models for QNN and NN by utilizing Pennylane for quantum and TensorFlow and Keras for traditional respectively. We evaluated the performance of both models with different proportions of the ClaMP dataset to identify the f1 score, recall, precision, and accuracy. We also measure the execution time to check the efficiency of both models. The demonstration result indicates that execution time for QNN is slower than NN with a higher percentage of datasets. Due to recent advancements in QNN, a large level of experiments shall be carried out to understand both models accurately in our future research.
    ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves. (arXiv:2102.08788v3 [cs.LG] UPDATED)
    Computing an AUC as a performance measure to compare the quality of different machine learning models is one of the final steps of many research projects. Many of these methods are trained on privacy-sensitive data and there are several different approaches like $\epsilon$-differential privacy, federated machine learning and cryptography if the datasets cannot be shared or used jointly at one place for training and/or testing. In this setting, it can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information. There have been approaches based on $\epsilon$-differential privacy to address this problem, but to the best of our knowledge, no exact privacy preserving solution has been introduced. In this paper, we propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC as one could obtain on the pooled original test samples. With ppAURORA, the computation of the exact area under precision-recall and receiver operating characteristic curves is possible even when ties between prediction confidence values exist. We use ppAURORA to evaluate two different models predicting acute myeloid leukemia therapy response and heart disease, respectively. We also assess its scalability via synthetic data experiments. All these experiments show that we efficiently and privately compute the exact same AUC with both evaluation metrics as one can obtain on the pooled test samples in plaintext according to the semi-honest adversary setting.
    Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation. (arXiv:2306.08224v1 [cs.RO])
    This paper proposes the transition-net, a robust transition strategy that expands the versatility of robot locomotion in the real-world setting. To this end, we start by distributing the complexity of different gaits into dedicated locomotion policies applicable to real-world robots. Next, we expand the versatility of the robot by unifying the policies with robust transitions into a single coherent meta-controller by examining the latent state representations. Our approach enables the robot to iteratively expand its skill repertoire and robustly transition between any policy pair in a library. In our framework, adding new skills does not introduce any process that alters the previously learned skills. Moreover, training of a locomotion policy takes less than an hour with a single consumer GPU. Our approach is effective in the real-world and achieves a 19% higher average success rate for the most challenging transition pairs in our experiments compared to existing approaches.
    Why Using Either Aggregated Features or Adjacency Lists in Directed or Undirected Graph? Empirical Study and Simple Classification Method. (arXiv:2306.08274v1 [cs.LG])
    Node classification is one of the hottest tasks in graph analysis. In this paper, we focus on the choices of node representations (aggregated features vs. adjacency lists) and the edge direction of an input graph (directed vs. undirected), which have a large influence on classification results. We address the first empirical study to benchmark the performance of various GNNs that use either combination of node representations and edge directions. Our experiments demonstrate that no single combination stably achieves state-of-the-art results across datasets, which indicates that we need to select appropriate combinations depending on the characteristics of datasets. In response, we propose a simple yet holistic classification method A2DUG which leverages all combinations of node representation variants in directed and undirected graphs. We demonstrate that A2DUG stably performs well on various datasets. Surprisingly, it largely outperforms the current state-of-the-art methods in several datasets. This result validates the importance of the adaptive effect control on the combinations of node representations and edge directions.
    LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting. (arXiv:2306.08259v1 [cs.LG])
    Traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.
    Detection and classification of faults aimed at preventive maintenance of PV systems. (arXiv:2306.08004v1 [cs.LG])
    Diagnosis in PV systems aims to detect, locate and identify faults. Diagnosing these faults is vital to guarantee energy production and extend the useful life of PV power plants. In the literature, multiple machine learning approaches have been proposed for this purpose. However, few of these works have paid special attention to the detection of fine faults and the specialized process of extraction and selection of features for their classification. A fine fault is one whose characteristic signature is difficult to distinguish to that of a healthy panel. As a contribution to the detection of fine faults (especially of the snail trail type), this article proposes an innovative approach based on the Random Forest (RF) algorithm. This approach uses a complex feature extraction and selection method that improves the computational time of fault classification while maintaining high accuracy.
    INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation. (arXiv:2306.08162v1 [cs.CL])
    We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization process. Our method reduces the memory requirements by up to 5.6 times, which enables fine-tuning a 7 billion parameter Large Language Model (LLM) on consumer laptops. At the same time, we propose a Low-Rank Error Correction (LREC) method that exploits the added LoRA layers to ameliorate the gap between the quantized model and its float point counterpart. Our error correction framework leads to a fully functional INT2 quantized LLM with the capacity to generate coherent English text. To the best of our knowledge, this is the first INT2 Large Language Model that has been able to reach such a performance. The overhead of our method is merely a 1.05 times increase in model size, which translates to an effective precision of INT2.1. Also, our method readily generalizes to other quantization standards, such as INT3, INT4, and INT8, restoring their lost performance, which marks a significant milestone in the field of model quantization. The strategies delineated in this paper hold promising implications for the future development and optimization of quantized models, marking a pivotal shift in the landscape of low-resource machine learning computations.
    CipherSniffer: Classifying Cipher Types. (arXiv:2306.08116v1 [cs.CL])
    Ciphers are a powerful tool for encrypting communication. There are many different cipher types, which makes it computationally expensive to solve a cipher using brute force. In this paper, we frame the decryption task as a classification problem. We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text. Then, we evaluate the performance of various tokenizer-model combinations on this task.
    Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training. (arXiv:2306.08173v1 [cs.LG])
    The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.
    FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks. (arXiv:2306.08277v1 [cs.LG])
    Modelling spatio-temporal processes on road networks is a task of growing importance. While significant progress has been made on developing spatio-temporal graph neural networks (Gnns), existing works are built upon three assumptions that are not practical on real-world road networks. First, they assume sensing on every node of a road network. In reality, due to budget-constraints or sensor failures, all locations (nodes) may not be equipped with sensors. Second, they assume that sensing history is available at all installed sensors. This is unrealistic as well due to sensor failures, loss of packets during communication, etc. Finally, there is an assumption of static road networks. Connectivity within networks change due to road closures, constructions of new roads, etc. In this work, we develop FRIGATE to address all these shortcomings. FRIGATE is powered by a spatio-temporal Gnn that integrates positional, topological, and temporal information into rich inductive node representations. The joint fusion of this diverse information is made feasible through a novel combination of gated Lipschitz embeddings with Lstms. We prove that the proposed Gnn architecture is provably more expressive than message-passing Gnns used in state-of-the-art algorithms. The higher expressivity of FRIGATE naturally translates to superior empirical performance conducted on real-world network-constrained traffic data. In addition, FRIGATE is robust to frugal sensor deployment, changes in road network connectivity, and temporal irregularity in sensing.
    Operationalising Representation in Natural Language Processing. (arXiv:2306.08193v1 [cs.CL])
    Despite its centrality in the philosophy of cognitive science, there has been little prior philosophical work engaging with the notion of representation in contemporary NLP practice. This paper attempts to fill that lacuna: drawing on ideas from cognitive science, I introduce a framework for evaluating the representational claims made about components of neural NLP models, proposing three criteria with which to evaluate whether a component of a model represents a property and operationalising these criteria using probing classifiers, a popular analysis technique in NLP (and deep learning more broadly). The project of operationalising a philosophically-informed notion of representation should be of interest to both philosophers of science and NLP practitioners. It affords philosophers a novel testing-ground for claims about the nature of representation, and helps NLPers organise the large literature on probing experiments, suggesting novel avenues for empirical research.
    Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations?. (arXiv:2007.00112v4 [cs.CV] UPDATED)
    Deep Convolutional Neural Networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (eg. blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance, eg. parts of the network could be specialized to recognize either transformed or non-transformed images. This paper investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance only emerges as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.
    Survey on Sociodemographic Bias in Natural Language Processing. (arXiv:2306.08158v1 [cs.CL])
    Deep neural networks often learn unintended biases during training, which might have harmful effects when deployed in real-world settings. This paper surveys 209 papers on bias in NLP models, most of which address sociodemographic bias. To better understand the distinction between bias and real-world harm, we turn to ideas from psychology and behavioral economics to propose a definition for sociodemographic bias. We identify three main categories of NLP bias research: types of bias, quantifying bias, and debiasing. We conclude that current approaches on quantifying bias face reliability issues, that many of the bias metrics do not relate to real-world biases, and that current debiasing techniques are superficial and hide bias rather than removing it. Finally, we provide recommendations for future work.
    Graph Laplacian Learning with Exponential Family Noise. (arXiv:2306.08201v1 [cs.LG])
    A common challenge in applying graph machine learning methods is that the underlying graph of a system is often unknown. Although different graph inference methods have been proposed for continuous graph signals, inferring the graph structure underlying other types of data, such as discrete counts, is under-explored. In this paper, we generalize a graph signal processing (GSP) framework for learning a graph from smooth graph signals to the exponential family noise distribution to model various data types. We propose an alternating algorithm that estimates the graph Laplacian as well as the unobserved smooth representation from the noisy signals. We demonstrate in synthetic and real-world data that our new algorithm outperforms competing Laplacian estimation methods under noise model mismatch.
    DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer. (arXiv:2306.08175v1 [eess.AS])
    Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the integration of a novel dynamic contextual carry-over mechanism in a state-of-the-art (SOTA) unified ASR system. Our proposed dynamic context Conformer (DCTX-Conformer) utilizes a non-overlapping contextual carry-over mechanism that takes into account both the left context of a chunk and one or more preceding context embeddings. We outperform the SOTA by a relative 25.0% word error rate, with a negligible latency impact due to the additional context embeddings.
    Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity. (arXiv:2306.08109v1 [cs.LG])
    Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov's momentum enjoys accelerated convergence under similar settings and assumptions. In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum achieves acceleration in theory for this objective class. We provide two realizations of the problem class, one of which is deep ReLU networks, which --to the best of our knowledge--constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.
    Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment. (arXiv:2306.08166v1 [cs.LG])
    Proteolysis-Targeting Chimeras (PROTACs) represent a novel class of small molecules which are designed to act as a bridge between an E3 ligase and a disease-relevant protein, thereby promoting its subsequent degradation. PROTACs are composed of two protein binding "active" domains, linked by a "linker" domain. The design of the linker domain is challenging due to geometric and chemical constraints given by its interactions, and the need to maximize drug-likeness. To tackle these challenges, we introduce ShapeLinker, a method for de novo design of linkers. It performs fragment-linking using reinforcement learning on an autoregressive SMILES generator. The method optimizes for a composite score combining relevant physicochemical properties and a novel, attention-based point cloud alignment score. This new method successfully generates linkers that satisfy both relevant 2D and 3D requirements, and achieves state-of-the-art results in producing novel linkers assuming a target linker conformation. This allows for more rational and efficient PROTAC design and optimization. Code and data are available at https://github.com/aivant/ShapeLinker.
    Multi-market Energy Optimization with Renewables via Reinforcement Learning. (arXiv:2306.08147v1 [cs.LG])
    This paper introduces a deep reinforcement learning (RL) framework for optimizing the operations of power plants pairing renewable energy with storage. The objective is to maximize revenue from energy markets while minimizing storage degradation costs and renewable curtailment. The framework handles complexities such as time coupling by storage devices, uncertainty in renewable generation and energy prices, and non-linear storage models. The study treats the problem as a hierarchical Markov Decision Process (MDP) and uses component-level simulators for storage. It utilizes RL to incorporate complex storage models, overcoming restrictions of optimization-based methods that require convex and differentiable component models. A significant aspect of this approach is ensuring policy actions respect system constraints, achieved via a novel method of projecting potentially infeasible actions onto a safe state-action set. The paper demonstrates the efficacy of this approach through extensive experiments using data from US and Indian electricity markets, comparing the learned RL policies with a baseline control policy and a retrospective optimal control policy. It validates the adaptability of the learning framework with various storage models and shows the effectiveness of RL in a complex energy optimization setting, in the context of multi-market bidding, probabilistic forecasts, and accurate storage component models.
    Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms. (arXiv:2306.08167v1 [cs.HC])
    Machine learning (ML) models that achieve high average accuracy can still underperform on semantically coherent subsets (i.e. "slices") of data. This behavior can have significant societal consequences for the safety or bias of the model in deployment, but identifying these underperforming slices can be difficult in practice, especially in domains where practitioners lack access to group annotations to define coherent subsets of their data. Motivated by these challenges, ML researchers have developed new slice discovery algorithms that aim to group together coherent and high-error subsets of data. However, there has been little evaluation focused on whether these tools help humans form correct hypotheses about where (for which groups) their model underperforms. We conduct a controlled user study (N = 15) where we show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about where an object detection model underperforms. Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step. We conclude by discussing design opportunities for ML and HCI researchers. Our findings point to the importance of centering users when designing and evaluating new tools for slice discovery.
    Uncertainty-Aware Robust Learning on Noisy Graphs. (arXiv:2306.08210v1 [cs.LG])
    Graph neural networks have shown impressive capabilities in solving various graph learning tasks, particularly excelling in node classification. However, their effectiveness can be hindered by the challenges arising from the widespread existence of noisy measurements associated with the topological or nodal information present in real-world graphs. These inaccuracies in observations can corrupt the crucial patterns within the graph data, ultimately resulting in undesirable performance in practical applications. To address these issues, this paper proposes a novel uncertainty-aware graph learning framework motivated by distributionally robust optimization. Specifically, we use a graph neural network-based encoder to embed the node features and find the optimal node embeddings by minimizing the worst-case risk through a minimax formulation. Such an uncertainty-aware learning process leads to improved node representations and a more robust graph predictive model that effectively mitigates the impact of uncertainty arising from data noise. Our experimental result shows that the proposed framework achieves superior predictive performance compared to the state-of-the-art baselines under various noisy settings.
    Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning. (arXiv:2306.08094v1 [cs.AI])
    The surge in Reinforcement Learning (RL) applications in Intelligent Transportation Systems (ITS) has contributed to its growth as well as highlighted key challenges. However, defining objectives of RL agents in traffic control and management tasks, as well as aligning policies with these goals through an effective formulation of Markov Decision Process (MDP), can be challenging and often require domain experts in both RL and ITS. Recent advancements in Large Language Models (LLMs) such as GPT-4 highlight their broad general knowledge, reasoning capabilities, and commonsense priors across various domains. In this work, we conduct a large-scale user study involving 70 participants to investigate whether novices can leverage ChatGPT to solve complex mixed traffic control problems. Three environments are tested, including ring road, bottleneck, and intersection. We find ChatGPT has mixed results. For intersection and bottleneck, ChatGPT increases number of successful policies by 150% and 136% compared to solely beginner capabilities, with some of them even outperforming experts. However, ChatGPT does not provide consistent improvements across all scenarios.
    Explainable and Position-Aware Learning in Digital Pathology. (arXiv:2306.08198v1 [eess.IV])
    Encoding whole slide images (WSI) as graphs is well motivated since it makes it possible for the gigapixel resolution WSI to be represented in its entirety for the purpose of graph learning. To this end, WSIs can be broken into smaller patches that represent the nodes of the graph. Then, graph-based learning methods can be utilized for the grading and classification of cancer. Message passing among neighboring nodes is the foundation of graph-based learning methods. However, they do not take into consideration any positional information for any of the patches, and if two patches are found in topologically isomorphic neighborhoods, their embeddings are nearly similar to one another. In this work, classification of cancer from WSIs is performed with positional embedding and graph attention. In order to represent the positional embedding of the nodes in graph classification, the proposed method makes use of spline convolutional neural networks (CNN). The algorithm is then tested with the WSI dataset for grading prostate cancer and kidney cancer. A comparison of the proposed method with leading approaches in cancer diagnosis and grading verify improved performance. The identification of cancerous regions in WSIs is another critical task in cancer diagnosis. In this work, the explainability of the proposed model is also addressed. A gradient-based explainbility approach is used to generate the saliency mapping for the WSIs. This can be used to look into regions of WSI that are responsible for cancer diagnosis thus rendering the proposed model explainable.
    Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD. (arXiv:2306.08125v1 [stat.ML])
    Neural network compression has been an increasingly important subject, due to its practical implications in terms of reducing the computational requirements and its theoretical implications, as there is an explicit connection between compressibility and the generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. Even though these results have shed some light on the role of the training dynamics over compressibility, they relied on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD and we inject additive heavy-tailed noise to the iterates at each iteration. We then show that, for any compression rate, there exists a level of overparametrization (i.e., the number of hidden units), such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we build on a recent study on stochastic analysis and prove a 'propagation of chaos' result with improved rates for a class of heavy-tailed stochastic differential equations, and (ii) we derive strong-error estimates for their Euler discretization. We finally illustrate our approach on experiments, where the results suggest that the proposed approach achieves compressibility with a slight compromise from the training and test error.
    Neural Mixed Effects for Nonlinear Personalized Predictions. (arXiv:2306.08149v1 [cs.LG])
    Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience at least moderate symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother's depression symptoms.
    Inductive Linear Probing for Few-shot Node Classification. (arXiv:2306.08192v1 [cs.LG])
    Meta-learning has emerged as a powerful training strategy for few-shot node classification, demonstrating its effectiveness in the transductive setting. However, the existing literature predominantly focuses on transductive few-shot node classification, neglecting the widely studied inductive setting in the broader few-shot learning community. This oversight limits our comprehensive understanding of the performance of meta-learning based methods on graph data. In this work, we conduct an empirical study to highlight the limitations of current frameworks in the inductive few-shot node classification setting. Additionally, we propose a simple yet competitive baseline approach specifically tailored for inductive few-shot node classification tasks. We hope our work can provide a new path forward to better understand how the meta-learning paradigm works in the graph domain.
    Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid. (arXiv:2306.07993v1 [cs.CR])
    The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs. Thus, proposing and developing a trustworthy AI framework to facilitate the deployment of any AI algorithms for detecting potential cyber threats and analyzing root causes based on Shapley value interpretation while dynamically quantifying the risk of an attack based on Ward's minimum variance formula. The experiment with a state-of-the-art dataset establishes the proposed framework as a trustworthy AI by fulfilling the capabilities of reliability, fairness, explainability, transparency, reproducibility, and accountability.
    DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation. (arXiv:2306.08009v1 [cs.LG])
    Backdoor attacks have emerged as an urgent threat to Deep Neural Networks (DNNs), where victim DNNs are furtively implanted with malicious neurons that could be triggered by the adversary. To defend against backdoor attacks, many works establish a staged pipeline to remove backdoors from victim DNNs: inspecting, locating, and erasing. However, in a scenario where a few clean data can be accessible, such pipeline is fragile and cannot erase backdoors completely without sacrificing model accuracy. To address this issue, in this paper, we propose a novel data-free holistic backdoor erasing (DHBE) framework. Instead of the staged pipeline, the DHBE treats the backdoor erasing task as a unified adversarial procedure, which seeks equilibrium between two different competing processes: distillation and backdoor regularization. In distillation, the backdoored DNN is distilled into a proxy model, transferring its knowledge about clean data, yet backdoors are simultaneously transferred. In backdoor regularization, the proxy model is holistically regularized to prevent from infecting any possible backdoor transferred from distillation. These two processes jointly proceed with data-free adversarial optimization until a clean, high-accuracy proxy model is obtained. With the novel adversarial design, our framework demonstrates its superiority in three aspects: 1) minimal detriment to model accuracy, 2) high tolerance for hyperparameters, and 3) no demand for clean data. Extensive experiments on various backdoor attacks and datasets are performed to verify the effectiveness of the proposed framework. Code is available at \url{https://github.com/yanzhicong/DHBE}
    Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework. (arXiv:2306.07992v1 [cs.CV])
    With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy.
    Semantic-Based Neural Network Repair. (arXiv:2306.07995v1 [cs.LG])
    Recently, neural networks have spread into numerous fields including many safety-critical systems. Neural networks are built (and trained) by programming in frameworks such as TensorFlow and PyTorch. Developers apply a rich set of pre-defined layers to manually program neural networks or to automatically generate them (e.g., through AutoML). Composing neural networks with different layers is error-prone due to the non-trivial constraints that must be satisfied in order to use those layers. In this work, we propose an approach to automatically repair erroneous neural networks. The challenge is in identifying a minimal modification to the network so that it becomes valid. Modifying a layer might have cascading effects on subsequent layers and thus our approach must search recursively to identify a "globally" minimal modification. Our approach is based on an executable semantics of deep learning layers and focuses on four kinds of errors which are common in practice. We evaluate our approach for two usage scenarios, i.e., repairing automatically generated neural networks and manually written ones suffering from common model bugs. The results show that we are able to repair 100% of a set of randomly generated neural networks (which are produced with an existing AI framework testing approach) effectively and efficiently (with an average repair time of 21.08s) and 93.75% of a collection of real neural network bugs (with an average time of 3min 40s).
    A Survey on Cross-Architectural IoT Malware Threat Hunting. (arXiv:2306.07989v1 [cs.CR])
    In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers highway for their malicious activities. The IoT devices employ various Unix-based architectures that follow ELF (Executable and Linkable Format) as their standard binary file specification. This study aims at providing a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches. Aided by a modern taxonomy, we discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works. We further provide more insights on the practical challenges involved in cross-architectural IoT malware threat hunting and discuss various avenues to instill potential future research.
    TopP\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models. (arXiv:2306.08013v1 [cs.LG])
    We propose a robust and reliable evaluation metric for generative models by introducing topological and statistical treatments for rigorous support estimation. Existing metrics, such as Inception Score (IS), Fr\'echet Inception Distance (FID), and the variants of Precision and Recall (P\&R), heavily rely on supports that are estimated from sample features. However, the reliability of their estimation has not been seriously discussed (and overlooked) even though the quality of the evaluation entirely depends on it. In this paper, we propose Topological Precision and Recall (TopP\&R, pronounced 'topper'), which provides a systematic approach to estimating supports, retaining only topologically and statistically important features with a certain level of confidence. This not only makes TopP\&R strong for noisy features, but also provides statistical consistency. Our theoretical and experimental results show that TopP\&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations, while accurately capturing the true trend of change in samples. To the best of our knowledge, this is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
    Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. (arXiv:2306.08018v1 [q-bio.QM])
    Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions, each curated to enhance the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on the representative LLM, we underscore the potency of Mol-Instructions to enhance the adaptability and cognitive acuity of large models within the complex sphere of biomolecular studies, thereby promoting advancements in the biomolecular research community. Mol-Instructions is made publicly accessible for future research endeavors and will be subjected to continual updates for enhanced applicability.
    (Amplified) Banded Matrix Factorization: A unified approach to private training. (arXiv:2306.08153v1 [cs.LG])
    Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $\epsilon$ becomes small). In this work, we show how MF can subsume prior state-of-the-art algorithms in both federated and centralized training settings, across all privacy budgets. The key technique throughout is the construction of MF mechanisms with banded matrices. For cross-device federated learning (FL), this enables multiple-participations with a relaxed device participation schema compatible with practical FL infrastructure (as demonstrated by a production deployment). In the centralized setting, we prove that banded matrices enjoy the same privacy amplification results as for the ubiquitous DP-SGD algorithm, but can provide strictly better performance in most scenarios -- this lets us always at least match DP-SGD, and often outperform it even at $\epsilon\ll2$. Finally, $\hat{b}$-banded matrices substantially reduce the memory and time complexity of per-step noise generation from $\mathcal{O}(n)$, $n$ the total number of iterations, to a constant $\mathcal{O}(\hat{b})$, compared to general MF mechanisms.
    MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer. (arXiv:2306.07994v1 [cs.CL])
    Unsupervised text style transfer task aims to rewrite a text into target style while preserving its main content. Traditional methods rely on the use of a fixed-sized vector to regulate text style, which is difficult to accurately convey the style strength for each individual token. In fact, each token of a text contains different style intensity and makes different contribution to the overall style. Our proposed method addresses this issue by assigning individual style vector to each token in a text, allowing for fine-grained control and manipulation of the style strength. Additionally, an adversarial training framework integrated with teacher-student learning is introduced to enhance training stability and reduce the complexity of high-dimensional optimization. The results of our experiments demonstrate the efficacy of our method in terms of clearly improved style transfer accuracy and content preservation in both two-style transfer and multi-style transfer settings.
    Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files. (arXiv:2306.07997v1 [cs.CR])
    Firewalls are critical components in securing communication networks by screening all incoming (and occasionally exiting) data packets. Filtering is carried out by comparing incoming data packets to a set of rules designed to prevent malicious code from entering the network. To regulate the flow of data packets entering and leaving a network, an Internet firewall keeps a track of all activity. While the primary function of log files is to aid in troubleshooting and diagnostics, the information they contain is also very relevant to system audits and forensics. Firewalls primary function is to prevent malicious data packets from being sent. In order to better defend against cyberattacks and understand when and how malicious actions are influencing the internet, it is necessary to examine log files. As a result, the firewall decides whether to 'allow,' 'deny,' 'drop,' or 'reset-both' the incoming and outgoing packets. In this research, we apply various categorization algorithms to make sense of data logged by a firewall device. Harmonic mean F1 score, recall, and sensitivity measurement data with a 99% accuracy score in the random forest technique are used to compare the classifier's performance. To be sure, the proposed characteristics did significantly contribute to enhancing the firewall classification rate, as seen by the high accuracy rates generated by the other methods.
    DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al.}$. (arXiv:2306.08068v1 [cs.CV])
    Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.
    Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain. (arXiv:2306.07974v1 [cs.CR])
    The rise of cryptocurrencies like Bitcoin, which enable transactions with a degree of pseudonymity, has led to a surge in various illicit activities, including ransomware payments and transactions on darknet markets. These illegal activities often utilize Bitcoin as the preferred payment method. However, current tools for detecting illicit behavior either rely on a few heuristics and laborious data collection processes or employ computationally inefficient graph neural network (GNN) models that are challenging to interpret. To overcome the computational and interpretability limitations of existing techniques, we introduce an effective solution called Chainlet Orbits. This approach embeds Bitcoin addresses by leveraging their topological characteristics in transactions. By employing our innovative address embedding, we investigate e-crime in Bitcoin networks by focusing on distinctive substructures that arise from illicit behavior. The results of our node classification experiments demonstrate superior performance compared to state-of-the-art methods, including both topological and GNN-based approaches. Moreover, our approach enables the use of interpretable and explainable machine learning models in as little as 15 minutes for most days on the Bitcoin transaction network.
    Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level. (arXiv:2306.08122v1 [cs.CL])
    The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.
    Unbiased Learning of Deep Generative Models with Structured Discrete Representations. (arXiv:2306.08230v1 [cs.LG])
    By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.
    Model-Free Market Risk Hedging Using Crowding Networks. (arXiv:2306.08105v1 [q-fin.PM])
    Crowding is widely regarded as one of the most important risk factors in designing portfolio strategies. In this paper, we analyze stock crowding using network analysis of fund holdings, which is used to compute crowding scores for stocks. These scores are used to construct costless long-short portfolios, computed in a distribution-free (model-free) way and without using any numerical optimization, with desirable properties of hedge portfolios. More specifically, these long-short portfolios provide protection for both small and large market price fluctuations, due to their negative correlation with the market and positive convexity as a function of market returns. By adding our long-short portfolio to a baseline portfolio such as a traditional 60/40 portfolio, our method provides an alternative way to hedge portfolio risk including tail risk, which does not require costly option-based strategies or complex numerical optimization. The total cost of such hedging amounts to the total cost of rebalancing the hedge portfolio.
    Symbolic Regression via Control Variable Genetic Programming. (arXiv:2306.08057v1 [cs.NE])
    Learning symbolic expressions directly from experiment data is a vital step in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches are limited to learning simple expressions. Regressing expressions involving many independent variables still remain out of reach. Motivated by the control variable experiments widely utilized in science, we propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design, rather than learning from a fixed dataset collected a priori. CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants. It then extends expressions learned in previous generations by adding new independent variables, using new control variable experiments in which these variables are allowed to vary. Theoretically, we show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions. Experimentally, CVGP outperforms several baselines in learning symbolic expressions involving multiple independent variables.
    AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks. (arXiv:2306.08107v1 [cs.LG])
    The fields of both Natural Language Processing (NLP) and Automated Machine Learning (AutoML) have achieved remarkable results over the past years. In NLP, especially Large Language Models (LLMs) have experienced a rapid series of breakthroughs very recently. We envision that the two fields can radically push the boundaries of each other through tight integration. To showcase this vision, we explore the potential of a symbiotic relationship between AutoML and LLMs, shedding light on how they can benefit each other. In particular, we investigate both the opportunities to enhance AutoML approaches with LLMs from different perspectives and the challenges of leveraging AutoML to further improve LLMs. To this end, we survey existing work, and we critically assess risks. We strongly believe that the integration of the two fields has the potential to disrupt both fields, NLP and AutoML. By highlighting conceivable synergies, but also risks, we aim to foster further exploration at the intersection of AutoML and LLMs.
    On Faking a Nash Equilibrium. (arXiv:2306.08041v1 [cs.MA])
    We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside it. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.
    Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models. (arXiv:2306.08000v1 [physics.med-ph])
    Recent advances in zero-shot learning have enabled the use of paired image-text data to replace structured labels, replacing the need for expert annotated datasets. Models such as CLIP-based CheXzero utilize these advancements in the domain of chest X-ray interpretation. We hypothesize that domain pre-trained models such as CXR-BERT, BlueBERT, and ClinicalBERT offer the potential to improve the performance of CLIP-like models with specific domain knowledge by replacing BERT weights at the cost of breaking the original model's alignment. We evaluate the performance of zero-shot classification models with domain-specific pre-training for detecting low-prevalence pathologies. Even though replacing the weights of the original CLIP-BERT degrades model performance on commonly found pathologies, we show that pre-trained text towers perform exceptionally better on low-prevalence diseases. This motivates future ensemble models with a combination of differently trained language models for maximal performance.
    Diffusion-based Conditional ECG Generation with Structured State Space Models. (arXiv:2301.08227v2 [eess.SP] UPDATED)
    Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
    Improving Training Stability for Multitask Ranking Models in Recommender Systems. (arXiv:2302.09178v2 [cs.LG] UPDATED)
    Recommender systems play an important role in many content platforms. While most recommendation research is dedicated to designing better models to improve user experience, we found that research on stabilizing the training for such models is severely under-explored. As recommendation models become larger and more sophisticated, they are more susceptible to training instability issues, i.e., loss divergence, which can make the model unusable, waste significant resources and block model developments. In this paper, we share our findings and best practices we learned for improving the training stability of a real-world multitask ranking model for YouTube recommendations. We show some properties of the model that lead to unstable training and conjecture on the causes. Furthermore, based on our observations of training dynamics near the point of training instability, we hypothesize why existing solutions would fail, and propose a new algorithm to mitigate the limitations of existing solutions. Our experiments on YouTube production dataset show the proposed algorithm can significantly improve training stability while not compromising convergence, comparing with several commonly used baseline methods.
    LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. (arXiv:2302.08191v3 [cs.IR] UPDATED)
    Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.
    Subject Granular Differential Privacy in Federated Learning. (arXiv:2206.03617v2 [cs.LG] UPDATED)
    This paper considers subject level privacy in the FL setting, where a subject is an individual whose private information is embodied by several data items either confined within a single federation user or distributed across multiple federation users. We propose two new algorithms that enforce subject level DP at each federation user locally. Our first algorithm, called LocalGroupDP, is a straightforward application of group differential privacy in the popular DP-SGD algorithm. Our second algorithm is based on a novel idea of hierarchical gradient averaging (HiGradAvgDP) for subjects participating in a training mini-batch. We also show that user level Local Differential Privacy (LDP) naturally guarantees subject level DP. We observe the problem of horizontal composition of subject level privacy loss in FL - subject level privacy loss incurred at individual users composes across the federation. We formally prove the subject level DP guarantee for our algorithms, and also show their effect on model utility loss. Our empirical evaluation on FEMNIST and Shakespeare datasets shows that LocalGroupDP delivers the best performance among our algorithms. However, its model utility lags behind that of models trained using a DP-SGD based algorithm that provides a weaker item level privacy guarantee. Privacy loss amplification due to subject sampling fractions and horizontal composition remain key challenges for model utility.  ( 2 min )
    Revisiting the Gumbel-Softmax in MADDPG. (arXiv:2302.11793v2 [cs.LG] UPDATED)
    MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.
    Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms. (arXiv:2302.10183v2 [cs.LG] UPDATED)
    In this work we propose deep learning-based algorithms for the computation of systemic shortfall risk measures defined via multivariate utility functions. We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations. The algorithms we provide allow for learning primal optimizers, optima for the dual representation and corresponding fair risk allocations. We test our algorithms by comparison to a benchmark model, based on a paired exponential utility function, for which we can provide explicit formulas. We also show evidence of convergence in a case for which explicit formulas are not available.
    Optimistic Planning by Regularized Dynamic Programming. (arXiv:2302.14004v3 [cs.LG] UPDATED)
    We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.
    Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks. (arXiv:2301.06683v4 [cs.CV] UPDATED)
    Large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, each dataset focuses only on a subset of findings that can be simultaneously present in a patient, making it challenging to train models that aggregate multiple datasets together. Therefore, data harmonization is crucial to leverage these datasets in aggregate to train clinically useful models with a complete representation of abnormalities that may occur within the thorax. To that end, we propose surgical aggregation, a collaborative learning framework for harmonizing and aggregating knowledge from distributed heterogeneous datasets with partial annotations. We evaluate surgical aggregation across synthetic and real-world heterogeneous datasets with partial annotations. Our results indicate that surgical aggregation outperforms current strategies, generalizes better, and has the potential to facilitate the development of clinically useful models even when using datasets with heterogeneous disease labels.
    Debiasing Recommendation by Learning Identifiable Latent Confounders. (arXiv:2302.05052v2 [cs.LG] UPDATED)
    Recommendation systems aim to predict users' feedback on items not exposed to them. Confounding bias arises due to the presence of unmeasured variables (e.g., the socio-economic status of a user) that can affect both a user's exposure and feedback. Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure. However, they cannot guarantee the identification of counterfactual feedback, which can lead to biased predictions. In this work, we propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables (e.g., observed user features) to resolve the aforementioned non-identification issue. The proposed iDCF is a general deconfounded recommendation framework that applies proximal causal inference to infer the unmeasured confounders and identify the counterfactual feedback with theoretical guarantees. Extensive experiments on various real-world and synthetic datasets verify the proposed method's effectiveness and robustness.
    Bandit Social Learning: Exploration under Myopic Behavior. (arXiv:2302.07425v3 [cs.GT] UPDATED)
    We study social learning dynamics where the agents collectively follow a simple multi-armed bandit protocol. Agents arrive sequentially, choose arms and receive associated rewards. Each agent observes the full history (arms and rewards) of the previous agents, and there are no private signals. While collectively the agents face exploration-exploitation tradeoff, each agent acts myopically, without regards to exploration. Motivating scenarios concern reviews and ratings on online platforms. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals, including the "unbiased" behavior as well as various behaviorial biases. While extreme versions of these behaviors correspond to well-known bandit algorithms, we prove that more moderate versions lead to stark exploration failures, and consequently to regret rates that are linear in the number of agents. We provide matching upper bounds on regret by analyzing "moderately optimistic" agents. As a special case of independent interest, we obtain a general result on failure of the greedy algorithm in multi-armed bandits. This is the first such result in the literature, to the best of our knowledge.
    Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models. (arXiv:2302.10289v6 [cs.LG] UPDATED)
    We use concept-based interpretable models to mitigate shortcut learning. Existing methods lack interpretability. Beginning with a Blackbox, we iteratively \emph{carve out} a mixture of interpretable experts (MoIE) and a \emph{residual network}. Each expert explains a subset of data using First Order Logic (FOL). While explaining a sample, the FOL from biased BB-derived MoIE detects the shortcut effectively. Finetuning the BB with Metadata Normalization (MDN) eliminates the shortcut. The FOLs from the finetuned-BB-derived MoIE verify the elimination of the shortcut. Our experiments show that MoIE does not hurt the accuracy of the original BB and eliminates shortcuts effectively.
    Learning Manifold Dimensions with Conditional Variational Autoencoders. (arXiv:2302.11756v2 [cs.LG] UPDATED)
    Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets.
    SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces. (arXiv:2301.11832v3 [cs.LG] UPDATED)
    Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
    Continual Vision-based Reinforcement Learning with Group Symmetries. (arXiv:2210.12301v2 [cs.LG] UPDATED)
    Continual reinforcement learning aims to sequentially learn a variety of tasks, retaining the ability to perform previously encountered tasks while simultaneously developing new policies for novel tasks. However, current continual RL approaches overlook the fact that certain tasks are identical under basic group operations like rotations or translations, especially with visual inputs. They may unnecessarily learn and maintain a new policy for each similar task, leading to poor sample efficiency and weak generalization capability. To address this, we introduce a unique Continual Vision-based Reinforcement Learning method that recognizes Group Symmetries, called COVERS, cultivating a policy for each group of equivalent tasks rather than individual tasks. COVERS employs a proximal policy optimization-based RL algorithm with an equivariant feature extractor and a novel task grouping mechanism that relies on the extracted invariant features. We evaluate COVERS on sequences of table-top manipulation tasks that incorporate image observations and robot proprioceptive information in both simulations and on real robot platforms. Our results show that COVERS accurately assigns tasks to their respective groups and significantly outperforms existing methods in terms of generalization capability.  ( 2 min )
    Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. (arXiv:2212.07525v2 [cs.LG] UPDATED)
    Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0 benefits from the rich contextualized target representations introduced in data2vec which enable a fast self-supervised learner. Experiments on ImageNet-1K image classification show that data2vec 2.0 matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time, on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6x less time, and on GLUE natural language understanding it matches a retrained RoBERTa model in half the time. Trading some speed for accuracy results in ImageNet-1K top-1 accuracy of 86.8\% with a ViT-L model trained for 150 epochs.
    Stop overkilling simple tasks with black-box models and use transparent models instead. (arXiv:2302.02804v2 [cs.LG] UPDATED)
    In recent years, the employment of deep learning methods has led to several significant breakthroughs in artificial intelligence. Different from traditional machine learning models, deep learning-based approaches are able to extract features autonomously from raw data. This allows for bypassing the feature engineering process, which is generally considered to be both error-prone and tedious. Moreover, deep learning strategies often outperform traditional models in terms of accuracy.
    Compositional Scene Representation Learning via Reconstruction: A Survey. (arXiv:2202.07135v4 [cs.LG] UPDATED)
    Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene representation learning with deep neural networks, including development history and categorizations of existing methods from the perspectives of the modeling of visual scenes and the inference of scene representations; then provide benchmarks, including an open source toolbox to reproduce the benchmark experiments, of representative methods that consider the most extensively studied problem setting and form the foundation for other methods; and finally discuss the limitations of existing methods and future directions of this research topic.  ( 2 min )
    From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams. (arXiv:2206.05442v6 [cs.LG] UPDATED)
    A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level on a corpus drawn from MIT, Harvard, and Cornell and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We provide a new dataset and benchmark of questions from machine learning final exams and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. We evaluate a large open language model, Meta's OPT, and compare the results with OpenAI's closed models. A student survey comparing the quality, appropriateness, and difficulty of machine-generated questions with human-written questions shows that across multiple aspects, machine-generated questions are indistinguishable from human-generated questions and are suitable for final exams. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to machine seconds.  ( 3 min )
    Adversarial Cheap Talk. (arXiv:2211.11030v2 [cs.LG] UPDATED)
    Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce non-stationarity, add stochasticity, see the Victim's actions, or access their parameters. Additionally, we present a simple meta-learning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT still significantly influences the Victim's training and testing performance, despite the highly constrained setting. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during train-time to directly and arbitrarily control the Victim at test-time. Project video and code are available at https://sites.google.com/view/adversarial-cheap-talk  ( 2 min )
    Early heart disease prediction using hybrid quantum classification. (arXiv:2208.08882v2 [quant-ph] UPDATED)
    The rate of heart morbidity and heart mortality increases significantly which affect the global public health and world economy. Early prediction of heart disease is crucial for reducing heart morbidity and mortality. This paper proposes two quantum machine learning methods i.e. hybrid quantum neural network and hybrid random forest quantum neural network for early detection of heart disease. The methods are applied on the Cleveland and Statlog datasets. The results show that hybrid quantum neural network and hybrid random forest quantum neural network are suitable for high dimensional and low dimensional problems respectively. The hybrid quantum neural network is sensitive to outlier data while hybrid random forest is robust on outlier data. A comparison between different machine learning methods shows that the proposed quantum methods are more appropriate for early heart disease prediction where 96.43% and 97.78% area under curve are obtained for Cleveland and Statlog dataset respectively.  ( 2 min )
    AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning. (arXiv:2205.15319v2 [cs.LG] UPDATED)
    Due to the popularity of Graph Neural Networks (GNNs), various GNN-based methods have been designed to reason on knowledge graphs (KGs). An important design component of GNN-based KG reasoning methods is called the propagation path, which contains a set of involved entities in each propagation step. Existing methods use hand-designed propagation paths, ignoring the correlation between the entities and the query relation. In addition, the number of involved entities will explosively grow at larger propagation steps. In this work, we are motivated to learn an adaptive propagation path in order to filter out irrelevant entities while preserving promising targets. First, we design an incremental sampling mechanism where the nearby targets and layer-wise connections can be preserved with linear complexity. Second, we design a learning-based sampling distribution to identify the semantically related entities. Extensive experiments show that our method is powerful, efficient, and semantic-aware. The code is available at https://github.com/LARS-research/AdaProp.  ( 2 min )
    On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning. (arXiv:2210.10763v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.  ( 2 min )
    Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters. (arXiv:2211.11031v3 [cs.LG] UPDATED)
    Deployed models decay over time due to shifting inputs, changing user needs, or emergent knowledge gaps. When harmful behaviors are identified, targeted edits are required. However, current model editors, which adjust specific behaviors of pre-trained models, degrade model performance over multiple edits. We propose GRACE, a Lifelong Model Editing method, which implements spot-fixes on streaming errors of a deployed model, ensuring minimal impact on unrelated inputs. GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights. This is the first method enabling thousands of sequential edits using only streaming errors. Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs. Our code is available at https://www.github.com/thartvigsen/grace}{github.com/thartvigsen/grace}.
    Physics-informed neural networks for gravity currents reconstruction from limited data. (arXiv:2211.09715v2 [physics.flu-dyn] UPDATED)
    The present work investigates the use of physics-informed neural networks (PINNs) for the 3D reconstruction of unsteady gravity currents from limited data. In the PINN context, the flow fields are reconstructed by training a neural network whose objective function penalizes the mismatch between the network predictions and the observed data and embeds the underlying equations using automatic differentiation. This study relies on a high-fidelity numerical experiment of the canonical lock-exchange configuration. This allows us to benchmark quantitatively the PINNs reconstruction capabilities on several training databases that mimic state-of-the-art experimental measurement techniques for density and velocity. Notably, spatially averaged density measurements by light attenuation technique (LAT) are employed for the training procedure. An optimal experimental setup for flow reconstruction by PINNs is proposed according to two criteria : the implementation complexity and the accuracy of the inferred fields.
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise. (arXiv:2208.08003v4 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.  ( 3 min )
    A Theoretical Framework for AI Models Explainability with Application in Biomedicine. (arXiv:2212.14447v4 [cs.AI] UPDATED)
    EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are operationalized and it provides new insight into common explanation methods that we analyze as case studies.
    GREAD: Graph Neural Reaction-Diffusion Networks. (arXiv:2211.14208v3 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.
    Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms. (arXiv:2211.15631v2 [quant-ph] UPDATED)
    Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large-scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. This work benchmarks the runtime and accuracy for a representative sample of specialized high-performance simulated and physical quantum processing units. Results show the QMware simulator can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The AWS SV1 simulator offers a runtime advantage for larger circuits, up to the maximum 34 qubits available with SV1. Beyond this limit, QMware can execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen-M2, can provide an exponential runtime advantage for circuits with more than 30 qubits. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, only IonQ's Harmony quantum device achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.  ( 3 min )
    TIDE: Time Derivative Diffusion for Deep Learning on Graphs. (arXiv:2212.02483v2 [cs.LG] UPDATED)
    A prominent paradigm for graph neural networks is based on the message-passing framework. In this framework, information communication is realized only between neighboring nodes. The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to oversmoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. Our approach allows for optimizing the spatial extent of diffusion across various tasks and network channels, thus enabling medium and long-distance communication efficiently. Furthermore, we show that our architecture design also enables local message-passing and thus inherits from the capabilities of local message-passing approaches. We show that on both widely used graph benchmarks and synthetic mesh and graph datasets, the proposed framework outperforms state-of-the-art methods by a significant margin
    HiveNAS: Neural Architecture Search using Artificial Bee Colony Optimization. (arXiv:2211.10250v2 [cs.NE] UPDATED)
    The traditional Neural Network-development process requires substantial expert knowledge and relies heavily on intuition and trial-and-error. Neural Architecture Search (NAS) frameworks were introduced to robustly search for network topologies, as well as facilitate the automated development of Neural Networks. While some optimization approaches -- such as Genetic Algorithms -- have been extensively explored in the NAS context, other Metaheuristic Optimization algorithms have not yet been investigated. In this study, we evaluate the viability of Artificial Bee Colony optimization for Neural Architecture Search. Our proposed framework, HiveNAS, outperforms existing state-of-the-art Swarm Intelligence-based NAS frameworks in a fraction of the time.
    Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery. (arXiv:2210.03516v3 [cs.NE] UPDATED)
    Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. However, these policies tend to be overfit to the exact specifications of the task and environment they were trained on, and thus do not perform well when conditions deviate slightly or when composed hierarchically to solve even more complex tasks. Recent work has shown that training a mixture of policies, as opposed to a single one, that are driven to explore different regions of the state-action space can address this shortcoming by generating a diverse set of behaviors, referred to as skills, that can be collectively used to great effect in adaptation tasks or for hierarchical planning. This is typically realized by including a diversity term - often derived from information theory - in the objective function optimized by RL. However these approaches often require careful hyperparameter tuning to be effective. In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. Through an extensive empirical evaluation comparing eight state-of-the-art algorithms (four flagship algorithms from each line of work) on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii) the skills' performance when used as primitives for hierarchical planning; QD methods are found to provide equal, and sometimes improved, performance whilst being less sensitive to hyperparameters and more scalable. As no single method is found to provide near-optimal performance across all environments, there is a rich scope for further research which we support by proposing future directions and providing optimized open-source implementations.
    FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation. (arXiv:2210.04296v4 [cs.LG] UPDATED)
    Score-based generative models (SGMs) learn a family of noise-conditional score functions corresponding to the data density perturbed with increasingly large amounts of noise. These perturbed data densities are linked together by the Fokker-Planck equation (FPE), a partial differential equation (PDE) governing the spatial-temporal evolution of a density undergoing a diffusion process. In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). Surprisingly, despite the impressive empirical performance, we observe that scores learned through denoising score matching (DSM) fail to fulfill the underlying score FPE, which is an inherent self-consistency property of the ground truth score. We prove that satisfying the score FPE is desirable as it improves the likelihood and the degree of conservativity. Hence, we propose to regularize the DSM objective to enforce satisfaction of the score FPE, and we show the effectiveness of this approach across various datasets.  ( 2 min )
    Automated Scoring for Reading Comprehension via In-context BERT Tuning. (arXiv:2205.09864v2 [cs.LG] UPDATED)
    Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item becomes difficult when models have a large number of parameters. In this paper, we report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully-designed input structure to provide contextual information on each item. We demonstrate the effectiveness of our approach via local evaluations using the training dataset provided by the challenge. We also discuss the biases, common error types, and limitations of our approach.  ( 2 min )
    Bayesian Fixed-Budget Best-Arm Identification. (arXiv:2211.08572v3 [cs.LG] UPDATED)
    Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
    LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification. (arXiv:2211.00825v2 [eess.AS] UPDATED)
    Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses.  ( 3 min )
    How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?. (arXiv:2210.08188v2 [cs.IT] UPDATED)
    We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. Distribution-free upper and lower bounds on the gen-error can also be obtained. Our findings offer new insights that the generalization performance of SSL with pseudo-labeling is affected not only by the information between the output hypothesis and input training data but also by the information {\em shared} between the {\em labeled} and {\em pseudo-labeled} data samples. This serves as a guideline to choose an appropriate pseudo-labeling method from a given family of methods. To deepen our understanding, we further explore two examples -- mean estimation and logistic regression. In particular, we analyze how the ratio of the number of unlabeled to labeled data $\lambda$ affects the gen-error under both scenarios. As $\lambda$ increases, the gen-error for mean estimation decreases and then saturates at a value larger than when all the samples are labeled, and the gap can be quantified {\em exactly} with our analysis, and is dependent on the \emph{cross-covariance} between the labeled and pseudo-labeled data samples. For logistic regression, the gen-error and the variance component of the excess risk also decrease as $\lambda$ increases.  ( 2 min )
    Multi-channel Autobidding with Budget and ROI Constraints. (arXiv:2302.01523v3 [cs.GT] UPDATED)
    In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf.
    CORL: Research-oriented Deep Offline Reinforcement Learning Library. (arXiv:2210.07105v3 [cs.LG] UPDATED)
    CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.  ( 2 min )
    Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in Badminton. (arXiv:2211.12217v2 [cs.LG] UPDATED)
    Sports analytics has captured increasing attention since analysis of the various data enables insights for training strategies, player evaluation, etc. In this paper, we focus on predicting what types of returning strokes will be made, and where players will move to based on previous strokes. As this problem has not been addressed to date, movement forecasting can be tackled through sequence-based and graph-based models by formulating as a sequence prediction task. However, existing sequence-based models neglect the effects of interactions between players, and graph-based models still suffer from multifaceted perspectives on the next movement. Moreover, there is no existing work on representing strategic relations among players' shot types and movements. To address these challenges, we first introduce the procedure of the Player Movements (PM) graph to exploit the structural movements of players with strategic relations. Based on the PM graph, we propose a novel Dynamic Graphs and Hierarchical Fusion for Movement Forecasting model (DyMF) with interaction style extractors to capture the mutual interactions of players themselves and between both players within a rally, and dynamic players' tactics across time. In addition, hierarchical fusion modules are designed to incorporate the style influence of both players and rally interactions. Extensive experiments show that our model empirically outperforms both sequence- and graph-based methods and demonstrate the practical usage of movement forecasting.
    Second-order optimization with lazy Hessians. (arXiv:2212.00781v3 [math.OC] UPDATED)
    We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.
    Mnemosyne: Learning to Train Transformers with Transformers. (arXiv:2302.01128v2 [cs.LG] UPDATED)
    Training complex machine learning (ML) architectures requires a compute and time consuming process of selecting the right optimizer and tuning its hyper-parameters. A new paradigm of learning optimizers from data has emerged as a better alternative to hand-designed ML optimizers. We propose Mnemosyne optimizer, that uses Performers: implicit low-rank attention Transformers. It can learn to train entire neural network architectures including other Transformers without any task-specific optimizer tuning. We show that Mnemosyne: (a) generalizes better than popular LSTM optimizer, (b) in particular can successfully train Vision Transformers (ViTs) while meta--trained on standard MLPs and (c) can initialize optimizers for faster convergence in Robotics applications. We believe that these results open the possibility of using Transformers to build foundational optimization models that can address the challenges of regular Transformer training. We complement our results with an extensive theoretical analysis of the compact associative memory used by Mnemosyne.
    Self-Supervised Training with Autoencoders for Visual Anomaly Detection. (arXiv:2206.11723v3 [cs.CV] UPDATED)
    Deep convolutional autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimising for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network by either reducing the size of the bottleneck layer or enforcing sparsity constraints on its activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime, which allows to use discriminative information during training focusing on the data manifold by means of a modified reconstruction error. This regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter for anomalous patterns. In contrast to related approaches, inference with our method is very efficient during training and prediction processing the entire input image in one single step. Our experiments on the MVTec AD dataset demonstrate high recognition and localization performance of the proposed method. On the texture-subset, in particular, our approach consistently outperforms a bunch of recent anomaly detection methods by a big margin.  ( 2 min )
    Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models. (arXiv:2210.09759v2 [cs.LG] UPDATED)
    In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks' performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose \textit{Pareto Manifold Learning}, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that \textit{Pareto Manifold Learning} outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.  ( 2 min )
    Learning under Data Drift with Time-Varying Importance Weights. (arXiv:2210.01422v3 [cs.LG] UPDATED)
    Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.  ( 2 min )
    BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning. (arXiv:2211.14568v2 [cs.LG] UPDATED)
    Continual Learning (CL) is the process of learning ceaselessly a sequence of tasks. Most existing CL methods deal with independent data (e.g., images and text) for which many benchmark frameworks and results under standard experimental settings are available. However, CL methods for graph data (graph CL) are surprisingly underexplored because of (a) the lack of standard experimental settings, especially regarding how to deal with the dependency between instances, (b) the lack of benchmark datasets and scenarios, and (c) high complexity in implementation and evaluation due to the dependency. In this paper, regarding (a), we define four standard incremental settings (task-, class-, domain-, and time-incremental) for graph data, which are naturally applied to many node-, link-, and graph-level problems. Regarding (b), we provide 25 benchmark scenarios based on 15 real-world graphs. Regarding (c), we develop BeGin, an easy and fool-proof framework for graph CL. BeGin is easily extended since it is modularized with reusable modules for data processing, algorithm design, and evaluation. Especially, the evaluation module is completely separated from user code to eliminate potential mistakes. Using all the above, we report extensive benchmark results of 10 graph CL methods. Compared to the latest benchmark for graph CL, using BeGin, we cover 3x more combinations of incremental settings and levels of problems. All assets for the benchmark framework are available at https://github.com/ShinhwanKang/BeGin.
    MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning. (arXiv:2210.04183v3 [cs.CV] UPDATED)
    Multimodal representation learning has shown promising improvements on various vision-language tasks. Most existing methods excel at building global-level alignment between vision and language while lacking effective fine-grained image-text interaction. In this paper, we propose a jointly masked multimodal modeling method to learn fine-grained multimodal representations. Our method performs joint masking on image-text input and integrates both implicit and explicit targets for the masked signals to recover. The implicit target provides a unified and debiased objective for vision and language, where the model predicts latent multimodal representations of the unmasked input. The explicit target further enriches the multimodal representations by recovering high-level and semantically meaningful information: momentum visual features of image patches and concepts of word tokens. Through such a masked modeling process, our model not only learns fine-grained multimodal interaction, but also avoids the semantic gap between high-level representations and low- or mid-level prediction targets (e.g. image pixels), thus producing semantically rich multimodal representations that perform well on both zero-shot and fine-tuned settings. Our pre-trained model (named MAMO) achieves state-of-the-art performance on various downstream vision-language tasks, including image-text retrieval, visual question answering, visual reasoning, and weakly-supervised visual grounding.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v2 [cs.LG] UPDATED)
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.  ( 2 min )
    Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure. (arXiv:2207.00445v3 [cs.SE] UPDATED)
    Novel test selectors used in simulation-based verification have been shown to significantly accelerate coverage closure regardless of the number of coverage holes. This paper presents a configurable and highly-automated framework for novel test selection based on neural networks. Three configurations of this framework are tested with a commercial signal processing unit. All three convincingly outperform random test selection with the largest saving of simulation being 49.37% to reach 99.5% coverage. The computational expense of the configurations is negligible compared to the simulation reduction. We compare the experimental results and discuss important characteristics related to the performance of the configurations.  ( 2 min )
    PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. (arXiv:2210.08964v3 [stat.ME] UPDATED)
    This paper presents a new perspective on time series forecasting. In existing time series forecasting methods, the models take a sequence of numerical values as input and yield numerical values as output. The existing SOTA models are largely based on the Transformer architecture, modified with multiple encoding mechanisms to incorporate the context and semantics around the historical data. Inspired by the successes of pre-trained language foundation models, we pose a question about whether these models can also be adapted to solve time-series forecasting. Thus, we propose a new forecasting paradigm: prompt-based time series forecasting (PromptCast). In this novel task, the numerical input and output are transformed into prompts and the forecasting task is framed in a sentence-to-sentence manner, making it possible to directly apply language models for forecasting purposes. To support and facilitate the research of this task, we also present a large-scale dataset (PISA) that includes three real-world forecasting scenarios. We evaluate different SOTA numerical-based forecasting methods and language generation models. The benchmark results with various forecasting settings demonstrate the proposed PromptCast with language generation models is a promising research direction. Additionally, in comparison to conventional numerical-based forecasting, PromptCast shows a much better generalization ability under the zero-shot setting.
    White-Box Adversarial Policies in Deep Reinforcement Learning. (arXiv:2209.02167v2 [cs.AI] UPDATED)
    In reinforcement learning (RL), adversarial policies can be developed by training an adversarial agent to minimize a target agent's rewards. Prior work has studied black-box versions of these attacks where the adversary only observes the world state and treats the target agent as any other part of the environment. However, this does not take into account additional structure in the problem. In this work, we take inspiration from the literature on white-box attacks to train more effective adversarial policies. We study white-box adversarial policies and show that having access to a target agent's internal state can be useful for identifying its vulnerabilities. We make two contributions. (1) We introduce white-box adversarial policies where an attacker observes both a target's internal state and the world state at each timestep. We formulate ways of using these policies to attack agents in 2-player games and text-generating language models. (2) We demonstrate that these policies can achieve higher initial and asymptotic performance against a target agent than black-box controls. Code is available at https://github.com/thestephencasper/lm_white_box_attacks
    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks. (arXiv:2202.00293v4 [stat.ML] UPDATED)
    Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
    Causal Modeling of Policy Interventions From Sequences of Treatments and Outcomes. (arXiv:2209.04142v5 [cs.LG] UPDATED)
    A treatment policy defines when and what treatments are applied to affect some outcome of interest. Data-driven decision-making requires the ability to predict what happens if a policy is changed. Existing methods that predict how the outcome evolves under different scenarios assume that the tentative sequences of future treatments are fixed in advance, while in practice the treatments are determined stochastically by a policy and may depend, for example, on the efficiency of previous treatments. Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes. Our model enables the estimation of a treatment policy from observational sequences of treatments and outcomes, and it can predict the interventional and counterfactual progression of the outcome after an intervention on the treatment policy (in contrast with the causal effect of a single treatment). We show with real-world and semi-synthetic data on blood glucose progression that our method can answer causal queries more accurately than existing alternatives.
    HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection. (arXiv:2206.15157v2 [cs.CV] UPDATED)
    Besides standard cameras, autonomous vehicles typically include multiple additional sensors, such as lidars and radars, which help acquire richer information for perceiving the content of the driving scene. While several recent works focus on fusing certain pairs of sensors - such as camera with lidar or radar - by using architectural components specific to the examined setting, a generic and modular sensor fusion architecture is missing from the literature. In this work, we propose HRFuser, a modular architecture for multi-modal 2D object detection. It fuses multiple sensors in a multi-resolution fashion and scales to an arbitrary number of input modalities. The design of HRFuser is based on state-of-the-art high-resolution networks for image-only dense prediction and incorporates a novel multi-window cross-attention block as the means to perform fusion of multiple modalities at multiple resolutions. We demonstrate via extensive experiments on nuScenes and the adverse conditions DENSE datasets that our model effectively leverages complementary features from additional modalities, substantially improving upon camera-only performance and consistently outperforming state-of-the-art 3D and 2D fusion methods evaluated on 2D object detection metrics. The source code is publicly available.
    Analysis and Comparison of Classification Metrics. (arXiv:2209.05355v3 [cs.LG] UPDATED)
    A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-score and MCC and argue that EC is superior to these traditional metrics, being more elegant, general, and intuitive, as well as being based on basic principles from statistics. The metrics above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics and argue that they are the most principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute the system's calibration loss and compare this metric with the standard expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for a variety of reasons.
    PLAtE: A Large-scale Dataset for List Page Web Extraction. (arXiv:2205.12386v2 [cs.CL] UPDATED)
    Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites. However, a barrier for continued progress is the small number of datasets large enough to train these models. In this work, we introduce the PLAtE (Pages of Lists Attribute Extraction) benchmark dataset as a challenging new web extraction task. PLAtE focuses on shopping data, specifically extractions from product review pages with multiple items encompassing the tasks of: (1) finding product-list segmentation boundaries and (2) extracting attributes for each product. PLAtE is composed of 52, 898 items collected from 6, 694 pages and 156, 014 attributes, making it the first largescale list page web extraction dataset. We use a multi-stage approach to collect and annotate the dataset and adapt three state-of-the-art web extraction models to the two tasks comparing their strengths and weaknesses both quantitatively and qualitatively.
    Cognitive Ledger Project: Towards Building Personal Digital Twins Through Cognitive Blockchain. (arXiv:2201.08163v2 [cs.AI] UPDATED)
    The Cognitive Ledger Project is an effort to develop a modular system for turning users' personal data into structured information and machine learning models based on a blockchain-based infrastructure. In this work-in-progress paper, we propose a cognitive architecture for cognitive digital twins. The suggested design embraces a cognitive blockchain (Cognitive ledger) at its core. The architecture includes several modules that turn users' activities in the digital environment into reusable knowledge objects and artificial intelligence that one day can work together to form the cognitive digital twin of users.
    Curricular Subgoals for Inverse Reinforcement Learning. (arXiv:2306.08232v1 [cs.LG])
    Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning, and has demonstrated its remarkable success in imitation learning. To promote expert-like behavior, existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert. However, these global designs are still limited by the redundant noise and error propagation problems, leading to the unsuitable reward assignment and thus downgrading the agent capability in complex multi-stage tasks. In this paper, we propose a novel Curricular Subgoal-based Inverse Reinforcement Learning (CSIRL) framework, that explicitly disentangles one task with several local subgoals to guide agent imitation. Specifically, CSIRL firstly introduces decision uncertainty of the trained agent over expert trajectories to dynamically select subgoals, which directly determines the exploration boundary of different task stages. To further acquire local reward functions for each stage, we customize a meta-imitation objective based on these curricular subgoals to train an intrinsic reward generator. Experiments on the D4RL and autonomous driving benchmarks demonstrate that the proposed methods yields results superior to the state-of-the-art counterparts, as well as better interpretability. Our code is available at https://github.com/Plankson/CSIRL.
    ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations. (arXiv:2306.08141v1 [cs.AI])
    As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose to a new metric the study the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.
    Volterra Neural Networks (VNNs). (arXiv:1910.09616v5 [cs.CV] UPDATED)
    The importance of inference in Machine Learning (ML) has led to an explosive number of different proposals in ML, and particularly in Deep Learning. In an attempt to reduce the complexity of Convolutional Neural Networks, we propose a Volterra filter-inspired Network architecture. This architecture introduces controlled non-linearities in the form of interactions between the delayed input samples of data. We propose a cascaded implementation of Volterra Filtering so as to significantly reduce the number of parameters required to carry out the same classification task as that of a conventional Neural Network. We demonstrate an efficient parallel implementation of this Volterra Neural Network (VNN), along with its remarkable performance while retaining a relatively simpler and potentially more tractable structure. Furthermore, we show a rather sophisticated adaptation of this network to nonlinearly fuse the RGB (spatial) information and the Optical Flow (temporal) information of a video sequence for action recognition. The proposed approach is evaluated on UCF-101 and HMDB-51 datasets for action recognition, and is shown to outperform state of the art CNN approaches.
    h2oGPT: Democratizing Large Language Models. (arXiv:2306.08161v1 [cs.CL])
    Foundation Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their real-world applications though natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of Large Language Models (LLMs) based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source GPTs. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.
    Leveraging dendritic properties to advance machine learning and neuro-inspired computing. (arXiv:2306.08007v1 [cs.NE])
    The brain is a remarkably capable and efficient system. It can process and store huge amounts of noisy and unstructured information using minimal energy. In contrast, current artificial intelligence (AI) systems require vast resources for training while still struggling to compete in tasks that are trivial for biological agents. Thus, brain-inspired engineering has emerged as a promising new avenue for designing sustainable, next-generation AI systems. Here, we describe how dendritic mechanisms of biological neurons have inspired innovative solutions for significant AI problems, including credit assignment in multilayer networks, catastrophic forgetting, and high energy consumption. These findings provide exciting alternatives to existing architectures, showing how dendritic research can pave the way for building more powerful and energy-efficient artificial learning systems.
    Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care. (arXiv:2306.08044v1 [cs.LG])
    Most medical treatment decisions are sequential in nature. Hence, there is substantial hope that reinforcement learning may make it possible to formulate precise data-driven treatment plans. However, a key challenge for most applications in this field is the sparse nature of primarily mortality-based reward functions, leading to decreased stability of offline estimates. In this work, we introduce a deep Q-learning approach able to obtain more reliable critical care policies. This method integrates relevant but noisy intermediate biomarker signals into the reward specification, without compromising the optimization of the main outcome of interest (e.g. patient survival). We achieve this by first pruning the action set based on all available rewards, and second training a final model based on the sparse main reward but with a restricted action set. By disentangling accurate and approximated rewards through action pruning, potential distortions of the main objective are minimized, all while enabling the extraction of valuable information from intermediate signals that can guide the learning process. We evaluate our method in both off-policy and offline settings using simulated environments and real health records of patients in intensive care units. Our empirical results indicate that pruning significantly reduces the size of the action space while staying mostly consistent with the actions taken by physicians, outperforming the current state-of-the-art offline reinforcement learning method conservative Q-learning. Our work is a step towards developing reliable policies by effectively harnessing the wealth of available information in data-intensive critical care environments.
    Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks. (arXiv:2306.07981v1 [cs.CR])
    One of the most significant challenges in the field of software code auditing is the presence of vulnerabilities in software source code. Every year, more and more software flaws are discovered, either internally in proprietary code or publicly disclosed. These flaws are highly likely to be exploited and can lead to system compromise, data leakage, or denial of service. To create a large-scale machine learning system for function level vulnerability identification, we utilized a sizable dataset of C and C++ open-source code containing millions of functions with potential buffer overflow exploits. We have developed an efficient and scalable vulnerability detection method based on neural network models that learn features extracted from the source codes. The source code is first converted into an intermediate representation to remove unnecessary components and shorten dependencies. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. Furthermore, we have proposed a neural network model that can overcome issues associated with traditional neural networks. We have used evaluation metrics such as F1 score, precision, recall, accuracy, and total execution time to measure the performance. We have conducted a comparative analysis between results derived from features containing a minimal text representation and semantic and syntactic information.
    Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language. (arXiv:2306.08014v1 [cs.AI])
    The Free Energy Principle (FEP) is a theoretical framework for describing how (intelligent) systems self-organise into coherent, stable structures by minimising a free energy functional. Active Inference (AIF) is a corollary of the FEP that specifically details how systems that are able to plan for the future (agents) function by minimising particular free energy functionals that incorporate information seeking components. This paper is the first in a series of two where we derive a synthetic version of AIF on free form factor graphs. The present paper focuses on deriving a local version of the free energy functionals used for AIF. This enables us to construct a version of AIF which applies to arbitrary graphical models and interfaces with prior work on message passing algorithms. The resulting messages are derived in our companion paper. We also identify a gap in the graphical notation used for factor graphs. While factor graphs are great at expressing a generative model, they have so far been unable to specify the full optimisation problem including constraints. To solve this problem we develop Constrained Forney-style Factor Graph (CFFG) notation which permits a fully graphical description of variational inference objectives. We then proceed to show how CFFG's can be used to reconstruct prior algorithms for AIF as well as derive new ones. The latter is demonstrated by deriving an algorithm that permits direct policy inference for AIF agents, circumventing a long standing scaling issue that has so far hindered the application of AIF in industrial settings. We demonstrate our algorithm on the classic T-maze task and show that it reproduces the information seeking behaviour that is a hallmark feature of AIF.
    Leveraging Machine Learning for Multichain DeFi Fraud Detection. (arXiv:2306.07972v1 [q-fin.GN])
    Since the inception of permissionless blockchains with Bitcoin in 2008, it became apparent that their most well-suited use case is related to making the financial system and its advantages available to everyone seamlessly without depending on any trusted intermediaries. Smart contracts across chains provide an ecosystem of decentralized finance (DeFi), where users can interact with lending pools, Automated Market Maker (AMM) exchanges, stablecoins, derivatives, etc. with a cumulative locked value which had exceeded 160B USD. While DeFi comes with high rewards, it also carries plenty of risks. Many financial crimes have occurred over the years making the early detection of malicious activity an issue of high priority. The proposed framework introduces an effective method for extracting a set of features from different chains, including the largest one, Ethereum and it is evaluated over an extensive dataset we gathered with the transactions of the most widely used DeFi protocols (23 in total, including Aave, Compound, Curve, Lido, and Yearn) based on a novel dataset in collaboration with Covalent. Different Machine Learning methods were employed, such as XGBoost and a Neural Network for identifying fraud accounts detection interacting with DeFi and we demonstrate that the introduction of novel DeFi-related features, significantly improves the evaluation results, where Accuracy, Precision, Recall, F1-score and F2-score where utilized.
    Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems. (arXiv:2210.10636v2 [cs.IR] UPDATED)
    Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate.
    Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs. (arXiv:2302.03655v2 [cs.LG] UPDATED)
    Graph neural networks that model 3D data, such as point clouds or atoms, are typically desired to be $SO(3)$ equivariant, i.e., equivariant to 3D rotations. Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ . This is accomplished by aligning the node embeddings' primary axis with the edge vectors, which sparsifies the tensor product and reduces the computational complexity from $O(L^6)$ to $O(L^3)$, where $L$ is the degree of the representation. We demonstrate the potential implications of this improvement by proposing the Equivariant Spherical Channel Network (eSCN), a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
    PrivaScissors: Enhance the Privacy of Collaborative Inference through the Lens of Mutual Information. (arXiv:2306.07973v1 [cs.CR])
    Edge-cloud collaborative inference empowers resource-limited IoT devices to support deep learning applications without disclosing their raw data to the cloud server, thus preserving privacy. Nevertheless, prior research has shown that collaborative inference still results in the exposure of data and predictions from edge devices. To enhance the privacy of collaborative inference, we introduce a defense strategy called PrivaScissors, which is designed to reduce the mutual information between a model's intermediate outcomes and the device's data and predictions. We evaluate PrivaScissors's performance on several datasets in the context of diverse attacks and offer a theoretical robustness guarantee.
    iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams. (arXiv:2303.01181v2 [cs.LG] UPDATED)
    Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift.
    SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics. (arXiv:2211.08277v2 [cs.LG] UPDATED)
    Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined systems may not capture the true dynamics of the epidemic due to the complexity of the disease transmission and human interactions. In order to overcome this drawback, we propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future trajectory of an observable variable without the knowledge of the other variables or the underlying system. We use random features model with sparse regression to handle the data scarcity issue and employ Takens' delay embedding theorem to capture the nature of the underlying system from the observed variable. We show that our approach outperforms compartmental models when applied to both simulated and real data.
    Large language models predict human sensory judgments across six modalities. (arXiv:2302.01308v2 [cs.CL] UPDATED)
    Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
    Localised Adaptive Spatial-Temporal Graph Neural Network. (arXiv:2306.06930v2 [cs.LG] UPDATED)
    Spatial-temporal graph models are prevailing for abstracting and modelling spatial and temporal dependencies. In this work, we ask the following question: whether and to what extent can we localise spatial-temporal graph models? We limit our scope to adaptive spatial-temporal graph neural networks (ASTGNNs), the state-of-the-art model architecture. Our approach to localisation involves sparsifying the spatial graph adjacency matrices. To this end, we propose Adaptive Graph Sparsification (AGS), a graph sparsification algorithm which successfully enables the localisation of ASTGNNs to an extreme extent (fully localisation). We apply AGS to two distinct ASTGNN architectures and nine spatial-temporal datasets. Intriguingly, we observe that spatial graphs in ASTGNNs can be sparsified by over 99.5\% without any decline in test accuracy. Furthermore, even when ASTGNNs are fully localised, becoming graph-less and purely temporal, we record no drop in accuracy for the majority of tested datasets, with only minor accuracy deterioration observed in the remaining datasets. However, when the partially or fully localised ASTGNNs are reinitialised and retrained on the same data, there is a considerable and consistent drop in accuracy. Based on these observations, we reckon that \textit{(i)} in the tested data, the information provided by the spatial dependencies is primarily included in the information provided by the temporal dependencies and, thus, can be essentially ignored for inference; and \textit{(ii)} although the spatial dependencies provide redundant information, it is vital for the effective training of ASTGNNs and thus cannot be ignored during training. Furthermore, the localisation of ASTGNNs holds the potential to reduce the heavy computation overhead required on large-scale spatial-temporal data and further enable the distributed deployment of ASTGNNs.
    Directional Privacy for Deep Learning. (arXiv:2211.04686v2 [cs.LG] UPDATED)
    Differentially Private Stochastic Gradient Descent (DP-SGD) is a key method for applying privacy in the training of deep learning models. This applies isotropic Gaussian noise to gradients during training, which can perturb these gradients in any direction, damaging utility. Metric DP, however, can provide alternative mechanisms based on arbitrary metrics that might be more suitable for preserving utility. In this paper, we apply \textit{directional privacy}, via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturb gradients in terms of \textit{angular distance} so that gradient direction is broadly preserved. We show that this provides both $\epsilon$-DP and $\epsilon d$-privacy for deep learning training, rather than the $(\epsilon, \delta)$-privacy of the Gaussian mechanism; we observe that the $\epsilon d$-privacy guarantee does not require a $\delta>0$ term but degrades smoothly according to the dissimilarity of the input gradients. As $\epsilon$s between these different frameworks cannot be directly compared, we examine empirical privacy calibration mechanisms that go beyond previous work on empirically calibrating privacy within standard DP frameworks using membership inference attacks (MIA); we show that a combination of enhanced MIA and reconstruction attacks provides a suitable method for privacy calibration. Experiments on key datasets then indicate that the VMF mechanism can outperform the Gaussian in the utility-privacy trade-off. In particular, our experiments provide a direct comparison of privacy between the two approaches in terms of their ability to defend against reconstruction and membership inference.
    Self-supervised Equality Embedded Deep Lagrange Dual for Approximate Constrained Optimization. (arXiv:2306.06674v2 [math.OC] UPDATED)
    Conventional solvers are often computationally expensive for constrained optimization, particularly in large-scale and time-critical problems. While this leads to a growing interest in using neural networks (NNs) as fast optimal solution approximators, incorporating the constraints with NNs is challenging. In this regard, we propose deep Lagrange dual with equality embedding (DeepLDE), a framework that learns to find an optimal solution without using labels. To ensure feasible solutions, we embed equality constraints into the NNs and train the NNs using the primal-dual method to impose inequality constraints. Furthermore, we prove the convergence of DeepLDE and show that the primal-dual learning method alone cannot ensure equality constraints without the help of equality embedding. Simulation results on convex, non-convex, and AC optimal power flow (AC-OPF) problems show that the proposed DeepLDE achieves the smallest optimality gap among all the NN-based approaches while always ensuring feasible solutions. Furthermore, the computation time of the proposed method is about 5 to 250 times faster than DC3 and the conventional solvers in solving constrained convex, non-convex optimization, and/or AC-OPF.
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. (arXiv:2305.06500v2 [cs.CV] UPDATED)
    Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available datasets, covering a wide variety of tasks and capabilities, and transform them into instruction tuning format. Additionally, we introduce an instruction-aware Query Transformer, which extracts informative features tailored to the given instruction. Trained on 13 held-in datasets, InstructBLIP attains state-of-the-art zero-shot performance across all 13 held-out datasets, substantially outperforming BLIP-2 and larger Flamingo models. Our models also lead to state-of-the-art performance when finetuned on individual downstream tasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts). Furthermore, we qualitatively demonstrate the advantages of InstructBLIP over concurrent multimodal models. All InstructBLIP models are open-sourced at https://github.com/salesforce/LAVIS/tree/main/projects/instructblip.
    MixupE: Understanding and Improving Mixup from Directional Derivative Perspective. (arXiv:2212.13381v3 [cs.LG] UPDATED)
    Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
    Tool Learning with Foundation Models. (arXiv:2304.08354v2 [cs.CL] UPDATED)
    Humans possess an extraordinary ability to create and utilize tools, allowing them to overcome physical limitations and explore new frontiers. With the advent of foundation models, AI systems have the potential to be equally adept in tool use as humans. This paradigm, i.e., tool learning with foundation models, combines the strengths of specialized tools and foundation models to achieve enhanced accuracy, efficiency, and automation in problem-solving. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors in this field. To this end, we present a systematic investigation of tool learning in this paper. We first introduce the background of tool learning, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models. Then we recapitulate existing tool learning research into tool-augmented and tool-oriented learning. We formulate a general tool learning framework: starting from understanding the user instruction, models should learn to decompose a complex task into several subtasks, dynamically adjust their plan through reasoning, and effectively conquer each sub-task by selecting appropriate tools. We also discuss how to train models for improved tool-use capabilities and facilitate the generalization in tool learning. Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools. Finally, we discuss several open problems that require further investigation for tool learning. Overall, we hope this paper could inspire future research in integrating tools with foundation models.
    Unprocessing Seven Years of Algorithmic Fairness. (arXiv:2306.07261v2 [cs.LG] UPDATED)
    Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation. Interpreting our findings, we recall a widely overlooked theoretical argument, present seven years ago, that accurately predicted what we observe.
    NF4 Isn't Information Theoretically Optimal (and that's Good). (arXiv:2306.06965v2 [cs.LG] UPDATED)
    This note shares some simple calculations and experiments related to absmax-based blockwise quantization, as used in Dettmers et al., 2023. Their proposed NF4 data type is said to be information theoretically optimal for representing normally distributed weights. I show that this can't quite be the case, as the distribution of the values to be quantized depends on the block-size. I attempt to apply these insights to derive an improved code based on minimizing the expected L1 reconstruction error, rather than the quantile based method. This leads to improved performance for larger quantization block sizes, while both codes perform similarly at smaller block sizes.
    Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations. (arXiv:2211.17244v3 [cs.LG] UPDATED)
    Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
    Stable Deep MRI Reconstruction using Generative Priors. (arXiv:2210.13834v3 [eess.IV] UPDATED)
    Data-driven approaches recently achieved remarkable success in magnetic resonance imaging (MRI) reconstruction, but integration into clinical routine remains challenging due to a lack of generalizability and interpretability. In this paper, we address these challenges in a unified framework based on generative image priors. We propose a novel deep neural network based regularizer which is trained in a generative setting on reference magnitude images only. After training, the regularizer encodes higher-level domain statistics which we demonstrate by synthesizing images without data. Embedding the trained model in a classical variational approach yields high-quality reconstructions irrespective of the sub-sampling pattern. In addition, the model shows stable behavior when confronted with out-of-distribution data in the form of contrast variation. Furthermore, a probabilistic interpretation provides a distribution of reconstructions and hence allows uncertainty quantification. To reconstruct parallel MRI, we propose a fast algorithm to jointly estimate the image and the sensitivity maps. The results demonstrate competitive performance, on par with state-of-the-art end-to-end deep learning methods, while preserving the flexibility with respect to sub-sampling patterns and allowing for uncertainty quantification.
    Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder. (arXiv:2211.08191v2 [eess.AS] UPDATED)
    Leveraging the fact that speaker identity and content vary on different time scales, \acrlong{fhvae} (\acrshort{fhvae}) uses different latent variables to symbolize these two attributes. Disentanglement of these attributes is carried out by different prior settings of the corresponding latent variables. For the prior of speaker identity variable, \acrshort{fhvae} assumes it is a Gaussian distribution with an utterance-scale varying mean and a fixed variance. By setting a small fixed variance, the training process promotes identity variables within one utterance gathering close to the mean of their prior. However, this constraint is relatively weak, as the mean of the prior changes between utterances. Therefore, we introduce contrastive learning into the \acrshort{fhvae} framework, to make the speaker identity variables gathering when representing the same speaker, while distancing themselves as far as possible from those of other speakers. The model structure has not been changed in this work but only the training process, thus no additional cost is needed during testing. Voice conversion has been chosen as the application in this paper. Latent variable evaluations include speaker verification and identification for the speaker identity variable, and speech recognition for the content variable. Furthermore, assessments of voice conversion performance are on the grounds of fake speech detection experiments. Results show that the proposed method improves both speaker identity and content feature extraction compared to \acrshort{fhvae}, and has better performance than baseline on conversion.
    The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization. (arXiv:2206.02768v3 [stat.ML] UPDATED)
    The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
    On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline. (arXiv:2212.05749v2 [cs.LG] UPDATED)
    In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area.
    Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance. (arXiv:2306.08008v1 [cs.LG])
    Deep reinforcement learning algorithms typically act on the same set of actions. However, this is not sufficient for a wide range of real-world applications where different subsets are available at each step. In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. When actions that lead to collisions are avoided, the continuous action space is split into variable parts. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets, and the available actions are learned from the observations. Therefore, we propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals. We demonstrate their performance in an obstacle avoidance task and compare the methods to penalties, projection, replacement, as well as discrete and continuous masking from the literature. The results suggest that discrete masking of action-values is the only effective method when constraints did not emerge during training. When restrictions are learned, the decision between projection, masking, and our ConstraintNet modification seems to depend on the task at hand. We compare the results with varying complexity and give directions for future work.
    Better Generalization with Semantic IDs: A case study in Ranking for Recommendations. (arXiv:2306.08121v1 [cs.IR])
    Training good representations for items is critical in recommender models. Typically, an item is assigned a unique randomly generated ID, and is commonly represented by learning an embedding corresponding to the value of the random ID. Although widely used, this approach have limitations when the number of items are large and items are power-law distributed -- typical characteristics of real-world recommendation systems. This leads to the item cold-start problem, where the model is unable to make reliable inferences for tail and previously unseen items. Removing these ID features and their learned embeddings altogether to combat cold-start issue severely degrades the recommendation quality. Content-based item embeddings are more reliable, but they are expensive to store and use, particularly for users' past item interaction sequence. In this paper, we use Semantic IDs, a compact discrete item representations learned from content embeddings using RQ-VAE that captures hierarchy of concepts in items. We showcase how we use them as a replacement of item IDs in a resource-constrained ranking model used in an industrial-scale video sharing platform. Moreover, we show how Semantic IDs improves the generalization ability of our system, without sacrificing top-level metrics.
    How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies. (arXiv:2207.04581v3 [cs.LG] UPDATED)
    With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.
    FP8 versus INT8 for efficient deep learning inference. (arXiv:2303.17951v2 [cs.LG] UPDATED)
    Recently, the idea of using FP8 as a number format for neural network training has been floating around the deep learning world. Given that most training is currently conducted with entire networks in FP32, or sometimes FP16 with mixed-precision, the step to having some parts of a network run in FP8 with 8-bit weights is an appealing potential speed-up for the generally costly and time-intensive training procedures in deep learning. A natural question arises regarding what this development means for efficient inference on edge devices. In the efficient inference device world, workloads are frequently executed in INT8. Sometimes going even as low as INT4 when efficiency calls for it. In this whitepaper, we compare the performance for both the FP8 and INT formats for efficient on-device inference. We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice. We also provide a hardware analysis showing that the FP formats are somewhere between 50-180% less efficient in terms of compute in dedicated hardware than the INT format. Based on our research and a read of the research field, we conclude that although the proposed FP8 format could be good for training, the results for inference do not warrant a dedicated implementation of FP8 in favor of INT8 for efficient inference. We show that our results are mostly consistent with previous findings but that important comparisons between the formats have thus far been lacking. Finally, we discuss what happens when FP8-trained networks are converted to INT8 and conclude with a brief discussion on the most efficient way for on-device deployment and an extensive suite of INT8 results for many models.
    Data Augmentation for Seizure Prediction with Generative Diffusion Model. (arXiv:2306.08256v1 [eess.SP])
    Objective: Seizure prediction is of great importance to improve the life of patients. The focal point is to distinguish preictal states from interictal ones. With the development of machine learning, seizure prediction methods have achieved significant progress. However, the severe imbalance problem between preictal and interictal data still poses a great challenge, restricting the performance of classifiers. Data augmentation is an intuitive way to solve this problem. Existing data augmentation methods generate samples by overlapping or recombining data. The distribution of generated samples is limited by original data, because such transformations cannot fully explore the feature space and offer new information. As the epileptic EEG representation varies among seizures, these generated samples cannot provide enough diversity to achieve high performance on a new seizure. As a consequence, we propose a novel data augmentation method with diffusion model called DiffEEG. Methods: Diffusion models are a class of generative models that consist of two processes. Specifically, in the diffusion process, the model adds noise to the input EEG sample step by step and converts the noisy sample into output random noise, exploring the distribution of data by minimizing the loss between the output and the noise added. In the denoised process, the model samples the synthetic data by removing the noise gradually, diffusing the data distribution to outward areas and narrowing the distance between different clusters. Results: We compared DiffEEG with existing methods, and integrated them into three representative classifiers. The experiments indicate that DiffEEG could further improve the performance and shows superiority to existing methods. Conclusion: This paper proposes a novel and effective method to solve the imbalanced problem and demonstrates the effectiveness and generality of our method.  ( 3 min )
    Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses. (arXiv:2106.09779v8 [cs.LG] UPDATED)
    This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.  ( 3 min )
    Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks. (arXiv:2202.02947v6 [cs.LG] UPDATED)
    Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.  ( 3 min )
    Topological Experience Replay. (arXiv:2203.15845v2 [cs.LG] UPDATED)
    State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer. This strategy often uniformly and randomly samples or prioritizes data sampling based on measures such as the temporal difference (TD) error. Such sampling strategies can be inefficient at learning Q-function because a state's Q-value depends on the Q-value of successor states. If the data sampling strategy ignores the precision of the Q-value estimate of the next state, it can lead to useless and often incorrect updates to the Q-values. To mitigate this issue, we organize the agent's experience into a graph that explicitly tracks the dependency between Q-values of states. Each edge in the graph represents a transition between two states by executing a single action. We perform value backups via a breadth-first search starting from that expands vertices in the graph starting from the set of terminal states and successively moving backward. We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience and operates from high-dimensional observational data such as images.  ( 2 min )
    Distributionally Robust Data Join. (arXiv:2202.05797v2 [cs.LG] UPDATED)
    Suppose we are given two datasets: a labeled dataset and unlabeled dataset which also has additional auxiliary features not present in the first dataset. What is the most principled way to use these datasets together to construct a predictor? The answer should depend upon whether these datasets are generated by the same or different distributions over their mutual feature sets, and how similar the test distribution will be to either of those distributions. In many applications, the two datasets will likely follow different distributions, but both may be close to the test distribution. We introduce the problem of building a predictor which minimizes the maximum loss over all probability distributions over the original features, auxiliary features, and binary labels, whose Wasserstein distance is $r_1$ away from the empirical distribution over the labeled dataset and $r_2$ away from that of the unlabeled dataset. This can be thought of as a generalization of distributionally robust optimization (DRO), which allows for two data sources, one of which is unlabeled and may contain auxiliary features.  ( 2 min )
    Bilinear value networks. (arXiv:2204.13695v2 [cs.AI] UPDATED)
    The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.  ( 2 min )
    SC2 Benchmark: Supervised Compression for Split Computing. (arXiv:2203.08875v2 [cs.LG] UPDATED)
    With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.  ( 2 min )
    Imagery Tracking of Sun Activity Using 2D Circular Kernel Time Series Transformation, Entropy Measures and Machine Learning Approaches. (arXiv:2306.08270v1 [astro-ph.SR])
    The sun is highly complex in nature and its observatory imagery features is one of the most important sources of information about the sun activity, space and Earth's weather conditions. The NASA, solar Dynamics Observatory captures approximately 70,000 images of the sun activity in a day and the continuous visual inspection of this solar observatory images is challenging. In this study, we developed a technique of tracking the sun's activity using 2D circular kernel time series transformation, statistical and entropy measures, with machine learning approaches. The technique involves transforming the solar observatory image section into 1-Dimensional time series (1-DTS) while the statistical and entropy measures (Approach 1) and direct classification (Approach 2) is used to capture the extraction features from the 1-DTS for machine learning classification into 'solar storm' and 'no storm'. We found that the potential accuracy of the model in tracking the activity of the sun is approximately 0.981 for Approach 1 and 0.999 for Approach 2. The stability of the developed approach to rotational transformation of the solar observatory image is evident. When training on the original dataset for Approach 1, the match index (T90) of the distribution of solar storm areas reaches T90 ~ 0.993, and T90 ~ 0.951 for Approach 2. In addition, when using the extended training base, the match indices increased to T90 ~ 0.994 and T90 ~ 1, respectively. This model consistently classifies areas with swirling magnetic lines associated with solar storms and is robust to image rotation, glare, and optical artifacts.  ( 3 min )
    We Should at Least Be Able to Design Molecules That Dock Well. (arXiv:2006.16955v5 [q-bio.BM] UPDATED)
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone towards the goal of automatically generating promising drug candidates.  ( 3 min )
    Detection of sepsis during emergency department triage using machine learning. (arXiv:2204.07657v6 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of 95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.  ( 3 min )
    Entropic Issues in Likelihood-Based OOD Detection. (arXiv:2109.10794v2 [stat.ML] UPDATED)
    Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.  ( 2 min )
    WARM: A Weakly (+Semi) Supervised Model for Solving Math word Problems. (arXiv:2104.06722v2 [cs.CL] UPDATED)
    Solving math word problems (MWPs) is an important and challenging problem in natural language processing. Existing approaches to solve MWPs require full supervision in the form of intermediate equations. However, labeling every MWP with its corresponding equations is a time-consuming and expensive task. In order to address this challenge of equation annotation, we propose a weakly supervised model for solving MWPs by requiring only the final answer as supervision. We approach this problem by first learning to generate the equation using the problem description and the final answer, which we subsequently use to train a supervised MWP solver. We propose and compare various weakly supervised techniques to learn to generate equations directly from the problem description and answer. Through extensive experiments, we demonstrate that without using equations for supervision, our approach achieves accuracy gains of 4.5% and 32% over the state-of-the-art weakly supervised approach, on the standard Math23K and AllArith datasets respectively. Additionally, we curate and release new datasets of roughly 10k MWPs each in English and in Hindi (a low resource language).These datasets are suitable for training weakly supervised models. We also present an extension of WARMM to semi-supervised learning and present further improvements on results, along with insights.  ( 2 min )
    $m^\ast$ of two-dimensional electron gas: a neural canonical transformation study. (arXiv:2201.03156v2 [cond-mat.stat-mech] UPDATED)
    The quasiparticle effective mass $m^\ast$ of interacting electrons is a fundamental quantity in the Fermi liquid theory. However, the precise value of the effective mass of uniform electron gas is still elusive after decades of research. The newly developed neural canonical transformation approach [Xie et al., J. Mach. Learn. 1, (2022)] offers a principled way to extract the effective mass of electron gas by directly calculating the thermal entropy at low temperature. The approach models a variational many-electron density matrix using two generative neural networks: an autoregressive model for momentum occupation and a normalizing flow for electron coordinates. Our calculation reveals a suppression of effective mass in the two-dimensional spin-polarized electron gas, which is more pronounced than previous reports in the low-density strong-coupling region. This prediction calls for verification in two-dimensional electron gas experiments.  ( 2 min )
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v3 [cs.LG] UPDATED)
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.  ( 2 min )
    Towards Green Automated Machine Learning: Status Quo and Future Directions. (arXiv:2111.05850v4 [cs.LG] UPDATED)
    Automated machine learning (AutoML) strives for the automatic configuration of machine learning algorithms and their composition into an overall (software) solution - a machine learning pipeline - tailored to the learning task (dataset) at hand. Over the last decade, AutoML has developed into an independent research field with hundreds of contributions. At the same time, AutoML is being criticised for its high resource consumption as many approaches rely on the (costly) evaluation of many machine learning pipelines, as well as the expensive large scale experiments across many datasets and approaches. In the spirit of recent work on Green AI, this paper proposes Green AutoML, a paradigm to make the whole AutoML process more environmentally friendly. Therefore, we first elaborate on how to quantify the environmental footprint of an AutoML tool. Afterward, different strategies on how to design and benchmark an AutoML tool wrt. their "greenness", i.e. sustainability, are summarized. Finally, we elaborate on how to be transparent about the environmental footprint and what kind of research incentives could direct the community into a more sustainable AutoML research direction. Additionally, we propose a sustainability checklist to be attached to every AutoML paper featuring all core aspects of Green AutoML.  ( 3 min )
    Investigating Membership Inference Attacks under Data Dependencies. (arXiv:2010.12112v4 [cs.CR] UPDATED)
    Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $\epsilon$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.  ( 2 min )
    On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems. (arXiv:2202.12262v2 [cs.LG] UPDATED)
    We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output whose activation functions contain an affine segment and whose hidden layers have width at least two. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine. In contrast to previous works, our analysis covers all sampling and parameterization regimes, general differentiable loss functions, arbitrary continuous nonpolynomial activation functions, and both the finite- and infinite-dimensional setting. It is further shown that the appearance of the spurious local minima in the considered training problems is a direct consequence of the universal approximation theorem and that the underlying mechanisms also cause, e.g., $L^p$-best approximation problems to be ill-posed in the sense of Hadamard for all networks that do not have a dense image. The latter result also holds without the assumption of local affine linearity and without any conditions on the hidden layers.  ( 2 min )
    Factorized linear discriminant analysis and its application in computational biology. (arXiv:2010.02171v5 [q-bio.QM] UPDATED)
    Navigating the complex landscape of single-cell transcriptomic data presents significant challenges. Central to this challenge is the identification of a meaningful representation of high-dimensional gene expression patterns that sheds light on the structural and functional properties of cell types. Pursuing model interpretability and computational simplicity, we often look for a linear transformation of the original data that aligns with key phenotypic features of cells. In response to this need, we introduce factorized linear discriminant analysis (FLDA), a novel method for linear dimensionality reduction. The crux of FLDA lies in identifying a linear function of gene expression levels that is highly correlated with one phenotypic feature while minimizing the influence of others. To augment this method, we integrate it with a sparsity-based regularization algorithm. This integration is crucial as it selects a subset of genes pivotal to a specific phenotypic feature or a combination thereof. To illustrate the effectiveness of FLDA, we apply it to transcriptomic datasets from neurons in the Drosophila optic lobe. We demonstrate that FLDA not only captures the inherent structural patterns aligned with phenotypic features but also uncovers key genes associated with each phenotype.  ( 2 min )
    Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer. (arXiv:2306.08204v1 [cs.AI])
    In the pursuit of artificial general intelligence (AGI), we tackle Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged approach. We employ the Decision Transformer in an imitation learning paradigm to model human problem-solving, and introduce an object detection algorithm, the Push and Pull clustering method. This dual strategy enhances AI's ARC problem-solving skills and provides insights for AGI progression. Yet, our work reveals the need for advanced data collection tools, robust training datasets, and refined model structures. This study highlights potential improvements for Decision Transformers and propels future AGI research.  ( 2 min )
    Contextual Font Recommendations based on User Intent. (arXiv:2306.08188v1 [cs.HC])
    Adobe Fonts has a rich library of over 20,000 unique fonts that Adobe users utilize for creating graphics, posters, composites etc. Due to the nature of the large library, knowing what font to select can be a daunting task that requires a lot of experience. For most users in Adobe products, especially casual users of Adobe Express, this often means choosing the default font instead of utilizing the rich and diverse fonts available. In this work, we create an intent-driven system to provide contextual font recommendations to users to aid in their creative journey. Our system takes in multilingual text input and recommends suitable fonts based on the user's intent. Based on user entitlements, the mix of free and paid fonts is adjusted. The feature is currently used by millions of Adobe Express users with a CTR of >25%.  ( 2 min )
    Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. (arXiv:2011.07442v4 [cs.SD] UPDATED)
    Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance.  ( 3 min )
    Autoencoding Under Normalization Constraints. (arXiv:2105.05735v3 [cs.LG] UPDATED)
    Likelihood is a standard estimate for outlier detection. The specific role of the normalization constraint is to ensure that the out-of-distribution (OOD) regime has a small likelihood when samples are learned using maximum likelihood. Because autoencoders do not possess such a process of normalization, they often fail to recognize outliers even when they are obviously OOD. We propose the Normalized Autoencoder (NAE), a normalized probabilistic model constructed from an autoencoder. The probability density of NAE is defined using the reconstruction error of an autoencoder, which is differently defined in the conventional energy-based model. In our model, normalization is enforced by suppressing the reconstruction of negative samples, significantly improving the outlier detection performance. Our experimental results confirm the efficacy of NAE, both in detecting outliers and in generating in-distribution samples.  ( 2 min )
    Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population. (arXiv:2105.00581v2 [stat.ML] UPDATED)
    Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.  ( 2 min )
    Differentially Private Wireless Federated Learning Using Orthogonal Sequences. (arXiv:2306.08280v1 [cs.IT])
    We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.  ( 2 min )
    QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization. (arXiv:2110.11331v4 [cs.LG] UPDATED)
    Parameterized Quantum Circuits (PQC) are promising towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of PQC models has a severe degradation on real quantum devices. Take Quantum Neural Network (QNN) as an example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4 classification is over 60%. Existing noise mitigation methods are general ones without leveraging unique characteristics of PQC; on the other hand, existing PQC work does not consider noise effect. To this end, we present QuantumNAT, a PQC-specific framework to perform noise-aware optimizations in both training and inference stages to improve robustness. We experimentally observe that the effect of quantum noise to PQC measurement outcome is a linear map from noise-free outcome with a scaling and a shift factor. Motivated by that, we propose post-measurement normalization to mitigate the feature distribution differences between noise-free and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware. Finally, post-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect. Extensive experiments on 8 classification tasks using 6 quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, and achieves over 94% 2-class, 80% 4-class, and 34% 10-class classification accuracy measured on real quantum computers. The code for construction and noise-aware training of PQC is available in the TorchQuantum library.  ( 3 min )
    Learning on Graphs under Label Noise. (arXiv:2306.08194v1 [cs.LG])
    Node classification on graphs is a significant task with a wide range of applications, including social analysis and anomaly detection. Even though graph neural networks (GNNs) have produced promising results on this task, current techniques often presume that label information of nodes is accurate, which may not be the case in real-world applications. To tackle this issue, we investigate the problem of learning on graphs with label noise and develop a novel approach dubbed Consistent Graph Neural Network (CGNN) to solve it. Specifically, we employ graph contrastive learning as a regularization term, which promotes two views of augmented nodes to have consistent representations. Since this regularization term cannot utilize label information, it can enhance the robustness of node representations to label noise. Moreover, to detect noisy labels on the graph, we present a sample selection technique based on the homophily assumption, which identifies noisy nodes by measuring the consistency between the labels with their neighbors. Finally, we purify these confident noisy labels to permit efficient semantic graph learning. Extensive experiments on three well-known benchmark datasets demonstrate the superiority of our CGNN over competing approaches.  ( 2 min )
    Focusing on Potential Named Entities During Active Label Acquisition. (arXiv:2111.03837v3 [cs.CL] UPDATED)
    Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into predefined named entity classes. While deep learning-based pre-trained language models help to achieve good predictive performances in NER, many domain-specific NER applications still call for a substantial amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for NER tasks to minimize the annotation cost without sacrificing model performance. However, the heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose several AL sentence query evaluation functions that pay more attention to potential positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize sentences that are too long or too short. Our experiments on three datasets from different domains reveal that the proposed approach reduces the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.  ( 2 min )
    MMASD: A Multimodal Dataset for Autism Intervention Analysis. (arXiv:2306.08243v1 [cs.CV])
    Autism spectrum disorder (ASD) is a developmental disorder characterized by significant social communication impairments and difficulties perceiving and presenting communication cues. Machine learning techniques have been broadly adopted to facilitate autism studies and assessments. However, computational models are primarily concentrated on specific analysis and validated on private datasets in the autism community, which limits comparisons across models due to privacy-preserving data sharing complications. This work presents a novel privacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmark dataset, collected from play therapy interventions of children with Autism. MMASD includes data from 32 children with ASD, and 1,315 data samples segmented from over 100 hours of intervention recordings. To promote public access, each data sample consists of four privacy-preserving modalities of data: (1) optical flow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scores of children, e.g., ADOS scores. MMASD aims to assist researchers and therapists in understanding children's cognitive status, monitoring their progress during therapy, and customizing the treatment plan accordingly. It also has inspiration for downstream tasks such as action quality assessment and interpersonal synchrony estimation. MMASD dataset can be easily accessed at https://github.com/Li-Jicheng/MMASD-A-Multimodal-Dataset-for-Autism-Intervention-Analysis.  ( 2 min )
    Solving Large-scale Spatial Problems with Convolutional Neural Networks. (arXiv:2306.08191v1 [cs.LG])
    Over the past decade, deep learning research has been accelerated by increasingly powerful hardware, which facilitated rapid growth in the model complexity and the amount of data ingested. This is becoming unsustainable and therefore refocusing on efficiency is necessary. In this paper, we employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation, and provide a theoretical bound on the resulting generalization error. Our proof leverages shift-equivariance of CNNs, a property that is underexploited in transfer learning. The theoretical results are experimentally supported in the context of mobile infrastructure on demand (MID). The proposed approach is able to tackle MID at large scales with hundreds of agents, which was computationally intractable prior to this work.  ( 2 min )
    POP: Prompt Of Prompts for Continual Learning. (arXiv:2306.08200v1 [cs.CV])
    Continual learning (CL) has attracted increasing attention in the recent past. It aims to mimic the human ability to learn new concepts without catastrophic forgetting. While existing CL methods accomplish this to some extent, they are still prone to semantic drift of the learned feature space. Foundation models, which are endowed with a robust feature representation, learned from very large datasets, provide an interesting substrate for the solution of the CL problem. Recent work has also shown that they can be adapted to specific tasks by prompt tuning techniques that leave the generality of the representation mostly unscathed. An open question is, however, how to learn both prompts that are task specific and prompts that are global, i.e. capture cross-task information. In this work, we propose the Prompt Of Prompts (POP) model, which addresses this goal by progressively learning a group of task-specified prompts and a group of global prompts, denoted as POP, to integrate information from the former. We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin. Moreover, as prompt tuning only requires a small set of training samples, POP is able to perform CL in the few-shot setting, while still outperforming competing methods trained on the entire dataset.  ( 2 min )
    Re-thinking Model Inversion Attacks Against Deep Neural Networks. (arXiv:2304.01669v2 [cs.LG] UPDATED)
    Model inversion (MI) attacks aim to infer and reconstruct private training data by abusing access to a model. MI attacks have raised concerns about the leaking of sensitive information (e.g. private face images used in training a face recognition system). Recently, several algorithms for MI have been proposed to improve the attack performance. In this work, we revisit MI, study two fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms, and propose solutions to these issues which lead to a significant boost in attack performance for all SOTA MI. In particular, our contributions are two-fold: 1) We analyze the optimization objective of SOTA MI algorithms, argue that the objective is sub-optimal for achieving MI, and propose an improved optimization objective that boosts attack performance significantly. 2) We analyze "MI overfitting", show that it would prevent reconstructed images from learning semantics of training data, and propose a novel "model augmentation" idea to overcome this issue. Our proposed solutions are simple and improve all SOTA MI attack accuracy significantly. E.g., in the standard CelebA benchmark, our solutions improve accuracy by 11.8% and achieve for the first time over 90% attack accuracy. Our findings demonstrate that there is a clear risk of leaking sensitive information from deep learning models. We urge serious consideration to be given to the privacy implications. Our code, demo, and models are available at https://ngoc-nguyen-0.github.io/re-thinking_model_inversion_attacks/  ( 3 min )
    Fast exploration and learning of latent graphs with aliased observations. (arXiv:2303.07397v3 [cs.LG] UPDATED)
    We consider the problem of recovering a latent graph where the observations at each node are \emph{aliased}, and transitions are stochastic. Observations are gathered by an agent traversing the graph. Aliasing means that multiple nodes emit the same observation, so the agent can not know in which node it is located. The agent needs to uncover the hidden topology as accurately as possible and in as few steps as possible. This is equivalent to efficient recovery of the transition probabilities of a partially observable Markov decision process (POMDP) in which the observation probabilities are known. An algorithm for efficiently exploring (and ultimately recovering) the latent graph is provided. Our approach is exponentially faster than naive exploration in a variety of challenging topologies with aliased observations while remaining competitive with existing baselines in the unaliased regime.  ( 2 min )
    Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners. (arXiv:2212.10873v3 [cs.CL] UPDATED)
    Through in-context learning (ICL), large-scale language models are effective few-shot learners without additional model fine-tuning. However, the ICL performance does not scale well with the number of available training samples as it is limited by the inherent input length constraint of the underlying language model. Meanwhile, many studies have revealed that language models are also powerful feature extractors, allowing them to be utilized in a black-box manner and enabling the linear probing paradigm, where lightweight discriminators are trained on top of the pre-extracted input representations. This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and ICL, which leverages the best of both worlds. PALP inherits the scalability of linear probing and the capability of enforcing language models to derive more meaningful representations via tailoring input into a more conceivable form. Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.  ( 2 min )
    Machine-Learned Premise Selection for Lean. (arXiv:2304.00994v2 [cs.AI] UPDATED)
    We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively.  ( 2 min )
    Contextual Multilingual Spellchecker for User Queries. (arXiv:2305.01082v2 [cs.CL] UPDATED)
    Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.  ( 2 min )
    Scalable Adaptive Computation for Iterative Generation. (arXiv:2212.11972v2 [cs.LG] UPDATED)
    Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.  ( 2 min )
    B-BACN: Bayesian Boundary-Aware Convolutional Network for Crack Characterization. (arXiv:2302.06827v3 [cs.CV] UPDATED)
    Accurately detecting crack boundaries is crucial for reliability assessment and risk management of structures and materials, such as structural health monitoring, diagnostics, prognostics, and maintenance scheduling. Uncertainty quantification of crack detection is challenging due to various stochastic factors, such as measurement noises, signal processing, and model simplifications. A machine learning-based approach is proposed to quantify both epistemic and aleatoric uncertainties concurrently. We introduce a Bayesian Boundary-Aware Convolutional Network (B-BACN) that emphasizes uncertainty-aware boundary refinement to generate precise and reliable crack boundary detections. The proposed method employs a multi-task learning approach, where we use Monte Carlo Dropout to learn the epistemic uncertainty and a Gaussian sampling function to predict each sample's aleatoric uncertainty. Moreover, we include a boundary refinement loss to B-BACN to enhance the determination of defect boundaries. The proposed method is demonstrated with benchmark experimental results and compared with several existing methods. The experimental results illustrate the effectiveness of our proposed approach in uncertainty-aware crack boundary detection, minimizing misclassification rate, and improving model calibration capabilities.  ( 2 min )
    MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation. (arXiv:2303.09975v3 [eess.IV] UPDATED)
    There has been exploding interest in embracing Transformer-based architectures for medical image segmentation. However, the lack of large-scale annotated medical datasets make achieving performances equivalent to those in natural images challenging. Convolutional networks, in contrast, have higher inductive biases and consequently, are easily trainable to high performance. Recently, the ConvNeXt architecture attempted to modernize the standard ConvNet by mirroring Transformer blocks. In this work, we improve upon this to design a modernized and scalable convolutional architecture customized to challenges of data-scarce medical settings. We introduce MedNeXt, a Transformer-inspired large kernel segmentation network which introduces - 1) A fully ConvNeXt 3D Encoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt up and downsampling blocks to preserve semantic richness across scales, 3) A novel technique to iteratively increase kernel sizes by upsampling small kernel networks, to prevent performance saturation on limited medical data, 4) Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt. This leads to state-of-the-art performance on 4 tasks on CT and MRI modalities and varying dataset sizes, representing a modernized deep architecture for medical image segmentation. Our code is made publicly available at: https://github.com/MIC-DKFZ/MedNeXt.  ( 2 min )
    The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold. (arXiv:2305.01604v2 [cs.LG] UPDATED)
    We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.  ( 2 min )
    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding. (arXiv:2304.10577v2 [cs.LG] UPDATED)
    Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2023) for robust and model-agnostic learning of conditional distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data to demonstrate how the method might be used in practice.  ( 2 min )
    Pretraining Language Models with Human Preferences. (arXiv:2302.08582v2 [cs.CL] UPDATED)
    Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.  ( 2 min )
    Towards Model-Agnostic Federated Learning over Networks. (arXiv:2302.04363v2 [cs.LG] UPDATED)
    We present a model-agnostic federated learning method for networks of heterogeneous data and models. The network structure reflects similarities between the (statistics of) local datasets and, in turn, their associated local("personal") models. Our method is an instance of empirical risk minimization, with the regularization term derived from the network structure of data. In particular, we require well-connected local models, forming clusters, to yield similar predictions on a common test set. The proposed method allows for a wide range of local models. The only restriction on these local models is that they allow for efficient implementation of regularized empirical risk minimization (training). For a wide range of models, such implementations are available in high-level programming libraries including scikit-learn, Keras or PyTorch.  ( 2 min )
  • Open

    Kernel Debiased Plug-in Estimation. (arXiv:2306.08598v1 [stat.ME])
    We consider the problem of estimating a scalar target parameter in the presence of nuisance parameters. Replacing the unknown nuisance parameter with a nonparametric estimator, e.g.,a machine learning (ML) model, is convenient but has shown to be inefficient due to large biases. Modern methods, such as the targeted minimum loss-based estimation (TMLE) and double machine learning (DML), achieve optimal performance under flexible assumptions by harnessing ML estimates while mitigating the plug-in bias. To avoid a sub-optimal bias-variance trade-off, these methods perform a debiasing step of the plug-in pre-estimate. Existing debiasing methods require the influence function of the target parameter as input. However, deriving the IF requires specialized expertise and thus obstructs the adaptation of these methods by practitioners. We propose a novel way to debias plug-in estimators which (i) is efficient, (ii) does not require the IF to be implemented, (iii) is computationally tractable, and therefore can be readily adapted to new estimation problems and automated without analytic derivations by the user. We build on the TMLE framework and update a plug-in estimate with a regularized likelihood maximization step over a nonparametric model constructed with a reproducing kernel Hilbert space (RKHS), producing an efficient plug-in estimate for any regular target parameter. Our method, thus, offers the efficiency of competing debiasing techniques without sacrificing the utility of the plug-in approach.  ( 2 min )
    Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population. (arXiv:2105.00581v2 [stat.ML] UPDATED)
    Learning individualized treatment rules (ITRs) is an important topic in precision medicine. Current literature mainly focuses on deriving ITRs from a single source population. We consider the observational data setting when the source population differs from a target population of interest. Compared with causal generalization for the average treatment effect which is a scalar quantity, ITR generalization poses new challenges due to the need to model and generalize the rules based on a prespecified class of functions which may not contain the unrestricted true optimal ITR. The aim of this paper is to develop a weighting framework to mitigate the impact of such misspecification and thus facilitate the generalizability of optimal ITRs from a source population to a target population. Our method seeks covariate balance over a non-parametric function class characterized by a reproducing kernel Hilbert space and can improve many ITR learning methods that rely on weights. We show that the proposed method encompasses importance weights and overlap weights as two extreme cases, allowing for a better bias-variance trade-off in between. Numerical examples demonstrate that the use of our weighting method can greatly improve ITR estimation for the target population compared with other weighting methods.  ( 2 min )
    Leveraging Evolutionary Changes for Software Process Quality. (arXiv:2305.18061v2 [cs.SE] UPDATED)
    Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues. Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively [...]  ( 3 min )
    Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement. (arXiv:2306.08956v1 [cs.SD])
    The Dual-Path Convolution Recurrent Network (DPCRN) was proposed to effectively exploit time-frequency domain information. By combining the DPRNN module with Convolution Recurrent Network (CRN), the DPCRN obtained a promising performance in speech separation with a limited model size. In this paper, we explore self-attention in the DPCRN module and design a model called Multi-Loss Convolutional Network with Time-Frequency Attention(MNTFA) for speech enhancement. We use self-attention modules to exploit the long-time information, where the intra-chunk self-attentions are used to model the spectrum pattern and the inter-chunk self-attention are used to model the dependence between consecutive frames. Compared to DPRNN, axial self-attention greatly reduces the need for memory and computation, which is more suitable for long sequences of speech signals. In addition, we propose a joint training method of a multi-resolution STFT loss and a WavLM loss using a pre-trained WavLM network. Experiments show that with only 0.23M parameters, the proposed model achieves a better performance than DPCRN.  ( 2 min )
    Revisiting the Gumbel-Softmax in MADDPG. (arXiv:2302.11793v2 [cs.LG] UPDATED)
    MADDPG is an algorithm in multi-agent reinforcement learning (MARL) that extends the popular single-agent method, DDPG, to multi-agent scenarios. Importantly, DDPG is an algorithm designed for continuous action spaces, where the gradient of the state-action value function exists. For this algorithm to work in discrete action spaces, discrete gradient estimation must be performed. For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisation which relaxes a discrete distribution into a similar continuous one. This method, however, is statistically biased, and a recent MARL benchmarking paper suggests that this bias makes MADDPG perform poorly in grid-world situations, where the action space is discrete. Fortunately, many alternatives to the GS exist, boasting a wide range of properties. This paper explores several of these alternatives and integrates them into MADDPG for discrete grid-world scenarios. The corresponding impact on various performance metrics is then measured and analysed. It is found that one of the proposed estimators performs significantly better than the original GS in several tasks, achieving up to 55% higher returns, along with faster convergence.  ( 2 min )
    Optimistic Planning by Regularized Dynamic Programming. (arXiv:2302.14004v3 [cs.LG] UPDATED)
    We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.  ( 2 min )
    Contextual Combinatorial Bandits with Probabilistically Triggered Arms. (arXiv:2303.17110v2 [cs.LG] UPDATED)
    We study contextual combinatorial bandits with probabilistically triggered arms (C$^2$MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC$^2$-UCB and derive a regret bound $\tilde{O}(d\sqrt{T})$, which is independent of the batch-size $K$. As a valuable by-product, our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C$^2$MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.
    Optimal Best-Arm Identification in Bandits with Access to Offline Data. (arXiv:2306.09048v1 [cs.LG])
    Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence $1-\delta$. We conduct a lower bound analysis on policies that provide such $1-\delta$ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when $\delta$ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of $\tilde{O}(K)$, and rely on a careful characterization of the optimality conditions of the lower bound problem.
    Optimal Exploration for Model-Based RL in Nonlinear Systems. (arXiv:2306.09210v1 [cs.LG])
    Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems~\citep{wagenmaker2021task}, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.
    Large language models predict human sensory judgments across six modalities. (arXiv:2302.01308v2 [cs.CL] UPDATED)
    Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality. To study the influence of specific languages on perception, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.
    Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise. (arXiv:2208.08003v4 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been a key in achieving state-of-the-art performance. This is captured by the double descent phenomenon, where the test loss follows a decreasing-increasing-decreasing pattern as model width increases. However, the effect of label noise on the test loss curve has not been fully explored. In this work, we uncover an intriguing phenomenon where label noise leads to a \textit{final ascent} in the originally observed double descent curve. Specifically, under a sufficiently large noise-to-sample-size ratio, optimal generalization is achieved at intermediate widths. Through theoretical analysis, we attribute this phenomenon to the shape transition of test loss variance induced by label noise. Furthermore, we extend the final ascent phenomenon to model density and provide the first theoretical characterization showing that reducing density by randomly dropping trainable parameters improves generalization under label noise. We also thoroughly examine the roles of regularization and sample size. Surprisingly, we find that larger $\ell_2$ regularization and robust learning methods against label noise exacerbate the final ascent. We confirm the validity of our findings through extensive experiments on ReLu networks trained on MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained on Stanford Cars with real-world noisy labels.
    Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning. (arXiv:2306.08803v1 [cs.LG])
    Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. However, many existing analytical and empirical results for TS rely on restrictive assumptions on reward distributions, such as belonging to conjugate families, which limits their applicability in realistic scenarios. Moreover, sequential decision making problems are often carried out in a batched manner, either due to the inherent nature of the problem or to serve the purpose of reducing communication and computation costs. In this work, we jointly study these problems in two popular settings, namely, stochastic multi-armed bandits (MABs) and infinite-horizon reinforcement learning (RL), where TS is used to learn the unknown reward distributions and transition dynamics, respectively. We propose batched $\textit{Langevin Thompson Sampling}$ algorithms that leverage MCMC methods to sample from approximate posteriors with only logarithmic communication costs in terms of batches. Our algorithms are computationally efficient and maintain the same order-optimal regret guarantees of $\mathcal{O}(\log T)$ for stochastic MABs, and $\mathcal{O}(\sqrt{T})$ for RL. We complement our theoretical findings with experimental results.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v2 [cs.LG] UPDATED)
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience that analyze complex and opaque systems. Here, we draw inspiration from how neuroscience uses topographic maps to visualize brain activity. To also visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare methods to obtain a topographic layout of neurons in a DNN layer. Moreover, we demonstrate how to use topographic activation maps to identify errors or encoded biases and to visualize training processes. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning.
    Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes. (arXiv:2304.14034v2 [cs.LG] UPDATED)
    Despite their many desirable properties, Gaussian processes (GPs) are often compared unfavorably to deep neural networks (NNs) for lacking the ability to learn representations. Recent efforts to bridge the gap between GPs and deep NNs have yielded a new class of inter-domain variational GPs in which the inducing variables correspond to hidden units of a feedforward NN. In this work, we examine some practical issues associated with this approach and propose an extension that leverages the orthogonal decomposition of GPs to mitigate these limitations. In particular, we introduce spherical inter-domain features to construct more flexible data-dependent basis functions for both the principal and orthogonal components of the GP approximation and show that incorporating NN activation features under this framework not only alleviates these shortcomings but is more scalable than alternative strategies. Experiments on multiple benchmark datasets demonstrate the effectiveness of our approach.
    The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization. (arXiv:2206.02768v3 [stat.ML] UPDATED)
    The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
    Differentially Private Domain Adaptation with Theoretical Guarantees. (arXiv:2306.08838v1 [cs.LG])
    In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of the non-private formulation.
    Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. (arXiv:2204.13939v3 [cs.LG] UPDATED)
    The transition to a fully renewable energy grid requires better forecasting of demand at the low-voltage level to increase efficiency and ensure reliable control. However, high fluctuations and increasing electrification cause huge forecast variability, not reflected in traditional point estimates. Probabilistic load forecasts take future uncertainties into account and thus allow more informed decision-making for the planning and operation of low-carbon energy systems. We propose an approach for flexible conditional density forecasting of short-term load based on Bernstein polynomial normalizing flows, where a neural network controls the parameters of the flow. In an empirical study with 363 smart meter customers, our density predictions compare favorably against Gaussian and Gaussian mixture densities. Also, they outperform a non-parametric approach based on the pinball loss for 24h-ahead load forecasting for two different neural network architectures.
    We Should at Least Be Able to Design Molecules That Dock Well. (arXiv:2006.16955v5 [q-bio.BM] UPDATED)
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that popular graph-based generative models fail to generate molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we propose a simplified version of the benchmark based on a simpler scoring function, and show that the tested models are able to partially solve it. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone towards the goal of automatically generating promising drug candidates.
    Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks. (arXiv:2202.00293v4 [stat.ML] UPDATED)
    Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
    Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs. (arXiv:2010.15285v3 [stat.ME] UPDATED)
    Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.
    Analysis and Approximate Inference of Large and Dense Random Kronecker Graphs. (arXiv:2306.08489v1 [stat.ML])
    Random graph models are playing an increasingly important role in science and industry, and finds their applications in a variety of fields ranging from social and traffic networks, to recommendation systems and molecular genetics. In this paper, we perform an in-depth analysis of the random Kronecker graph model proposed in \cite{leskovec2010kronecker}, when the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory, we show, in the dense regime, that the random Kronecker graph adjacency matrix follows approximately a signal-plus-noise model, with a small-rank (of order at most $\log N$) signal matrix that is linear in the graph parameters and a random noise matrix having a quarter-circle-form singular value distribution. This observation allows us to propose a ``denoise-and-solve'' meta algorithm to approximately infer the graph parameters, with reduced computational complexity and (asymptotic) performance guarantee. Numerical experiments of graph inference and graph classification on both synthetic and realistic graphs are provided to support the advantageous performance of the proposed approach.
    Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models. (arXiv:2306.09251v1 [stat.ML])
    Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. In this work, we develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time, assuming access to reliable estimates of the (Stein) score functions. For a popular deterministic sampler (based on the probability flow ODE), we establish a convergence rate proportional to $1/T$ (with $T$ the total number of steps), improving upon past results; for another mainstream stochastic sampler (i.e., a type of the denoising diffusion probabilistic model (DDPM)), we derive a convergence rate proportional to $1/\sqrt{T}$, matching the state-of-the-art theory. Our theory imposes only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), and is developed based on an elementary yet versatile non-asymptotic approach without resorting to toolboxes for SDEs and ODEs. Further, we design two accelerated variants, improving the convergence to $1/T^2$ for the ODE-based sampler and $1/T$ for the DDPM-type sampler, which might be of independent theoretical and empirical interest.
    Bootstrap aggregation and confidence measures to improve time series causal discovery. (arXiv:2306.08946v1 [stat.ME])
    Causal discovery methods have demonstrated the ability to identify the time series graphs representing the causal temporal dependency structure of dynamical systems. However, they do not include a measure of the confidence of the estimated links. Here, we introduce a novel bootstrap aggregation (bagging) and confidence measure method that is combined with time series causal discovery. This new method allows measuring confidence for the links of the time series graphs calculated by causal discovery methods. This is done by bootstrapping the original times series data set while preserving temporal dependencies. Next to confidence measures, aggregating the bootstrapped graphs by majority voting yields a final aggregated output graph. In this work, we combine our approach with the state-of-the-art conditional-independence-based algorithm PCMCI+. With extensive numerical experiments we empirically demonstrate that, in addition to providing confidence measures for links, Bagged-PCMCI+ improves the precision and recall of its base algorithm PCMCI+. Specifically, Bagged-PCMCI+ has a higher detection power regarding adjacencies and a higher precision in orienting contemporaneous edges while at the same time showing a lower rate of false positives. These performance improvements are especially pronounced in the more challenging settings (short time sample size, large number of variables, high autocorrelation). Our bootstrap approach can also be combined with other time series causal discovery algorithms and can be of considerable use in many real-world applications, especially when confidence measures for the links are desired.
    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding. (arXiv:2304.10577v2 [cs.LG] UPDATED)
    Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitrarily and unknowingly bias any causal estimate based on observational data. We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. We derive the B-Learner by adapting recent results for sharp and valid bounds of the average treatment effect (Dorn et al., 2021) into the framework given by Kallus & Oprescu (2023) for robust and model-agnostic learning of conditional distributional treatment effects. The B-Learner can use any function estimator such as random forests and deep neural networks, and we prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods. Semi-synthetic experimental comparisons validate the theoretical findings, and we use real-world data to demonstrate how the method might be used in practice.
    Logarithmic Bayes Regret Bounds. (arXiv:2306.09136v1 [cs.LG])
    We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a $O(c_h \log^2 n)$ bound, where $c_h$ is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the $\tilde{O}(\sqrt{n})$ bounds, that despite the existing lower bounds, have become standard in the literature.
    SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces. (arXiv:2301.11832v3 [cs.LG] UPDATED)
    Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
    WavPool: A New Block for Deep Neural Networks. (arXiv:2306.08734v1 [cs.LG])
    Modern deep neural networks comprise many operational layers, such as dense or convolutional layers, which are often collected into blocks. In this work, we introduce a new, wavelet-transform-based network architecture that we call the multi-resolution perceptron: by adding a pooling layer, we create a new network block, the WavPool. The first step of the multi-resolution perceptron is transforming the data into its multi-resolution decomposition form by convolving the input data with filters of fixed coefficients but increasing size. Following image processing techniques, we are able to make scale and spatial information simultaneously accessible to the network without increasing the size of the data vector. WavPool outperforms a similar multilayer perceptron while using fewer parameters, and outperforms a comparable convolutional neural network by ~ 10% on relative accuracy on CIFAR-10.
    Energy Trees: Regression and Classification With Structured and Mixed-Type Covariates. (arXiv:2207.04430v2 [stat.ME] UPDATED)
    The increasing complexity of data requires methods and models that can effectively handle intricate structures, as simplifying them would result in loss of information. While several analytical tools have been developed to work with complex data objects in their original form, these tools are typically limited to single-type variables. In this work, we propose energy trees as a regression and classification model capable of accommodating structured covariates of various types. Energy trees leverage energy statistics to extend the capabilities of conditional inference trees, from which they inherit sound statistical foundations, interpretability, scale invariance, and freedom from distributional assumptions. We specifically focus on functional and graph-structured covariates, while also highlighting the model's flexibility in integrating other variable types. Extensive simulation studies demonstrate the model's competitive performance in terms of variable selection and robustness to overfitting. Finally, we assess the model's predictive ability through two empirical analyses involving human biological data. Energy trees are implemented in the R package etree.
    Learning Manifold Dimensions with Conditional Variational Autoencoders. (arXiv:2302.11756v2 [cs.LG] UPDATED)
    Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets.
    Semantic HELM: An Interpretable Memory for Reinforcement Learning. (arXiv:2306.09312v1 [cs.LG])
    Reinforcement learning agents deployed in the real world often have to cope with partially observable environments. Therefore, most agents employ memory mechanisms to approximate the state of the environment. Recently, there have been impressive success stories in mastering partially observable environments, mostly in the realm of computer games like Dota 2, StarCraft II, or MineCraft. However, none of these methods are interpretable in the sense that it is not comprehensible for humans how the agent decides which actions to take based on its inputs. Yet, human understanding is necessary in order to deploy such methods in high-stake domains like autonomous driving or medical applications. We propose a novel memory mechanism that operates on human language to illuminate the decision-making process. First, we use CLIP to associate visual inputs with language tokens. Then we feed these tokens to a pretrained language model that serves the agent as memory and provides it with a coherent and interpretable representation of the past. Our memory mechanism achieves state-of-the-art performance in environments where memorizing the past is crucial to solve tasks. Further, we present situations where our memory component excels or fails to demonstrate strengths and weaknesses of our new approach.
    Bandits with Replenishable Knapsacks: the Best of both Worlds. (arXiv:2306.08470v1 [cs.LG])
    The bandits with knapsack (BwK) framework models online decision-making problems in which an agent makes a sequence of decisions subject to resource consumption constraints. The traditional model assumes that each action consumes a non-negative amount of resources and the process ends when the initial budgets are fully depleted. We study a natural generalization of the BwK framework which allows non-monotonic resource utilization, i.e., resources can be replenished by a positive amount. We propose a best-of-both-worlds primal-dual template that can handle any online learning problem with replenishment for which a suitable primal regret minimizer exists. In particular, we provide the first positive results for the case of adversarial inputs by showing that our framework guarantees a constant competitive ratio $\alpha$ when $B=\Omega(T)$ or when the possible per-round replenishment is a positive constant. Moreover, under a stochastic input model, our algorithm yields an instance-independent $\tilde{O}(T^{1/2})$ regret bound which complements existing instance-dependent bounds for the same setting. Finally, we provide applications of our framework to some economic problems of practical relevance.
    Hyperbolic Convolution via Kernel Point Aggregation. (arXiv:2306.08862v1 [cs.LG])
    Learning representations according to the underlying geometry is of vital importance for non-Euclidean data. Studies have revealed that the hyperbolic space can effectively embed hierarchical or tree-like data. In particular, the few past years have witnessed a rapid development of hyperbolic neural networks. However, it is challenging to learn good hyperbolic representations since common Euclidean neural operations, such as convolution, do not extend to the hyperbolic space. Most hyperbolic neural networks do not embrace the convolution operation and ignore local patterns. Others either only use non-hyperbolic convolution, or miss essential properties such as equivariance to permutation. We propose HKConv, a novel trainable hyperbolic convolution which first correlates trainable local hyperbolic features with fixed kernel points placed in the hyperbolic space, then aggregates the output features within a local neighborhood. HKConv not only expressively learns local features according to the hyperbolic geometry, but also enjoys equivariance to permutation of hyperbolic points and invariance to parallel transport of a local neighborhood. We show that neural networks with HKConv layers advance state-of-the-art in various tasks.
    On the Exactness of Dantzig-Wolfe Relaxation for Rank Constrained Optimization Problems. (arXiv:2210.16191v3 [math.OC] UPDATED)
    In the rank-constrained optimization problem (RCOP), it minimizes a linear objective function over a prespecified closed rank-constrained domain set and $m$ generic two-sided linear matrix inequalities. Motivated by the Dantzig-Wolfe (DW) decomposition, a popular approach of solving many nonconvex optimization problems, we investigate the strength of DW relaxation (DWR) of the RCOP, which admits the same formulation as RCOP except replacing the domain set by its closed convex hull. Notably, our goal is to characterize conditions under which the DWR matches RCOP for any m two-sided linear matrix inequalities. From the primal perspective, we develop the first-known simultaneously necessary and sufficient conditions that achieve: (i) extreme point exactness -- all the extreme points of the DWR feasible set belong to that of the RCOP; (ii) convex hull exactness -- the DWR feasible set is identical to the closed convex hull of RCOP feasible set; and (iii) objective exactness -- the optimal values of the DWR and RCOP coincide. The proposed conditions unify, refine, and extend the existing exactness results in the quadratically constrained quadratic program (QCQP) and fair unsupervised learning. These conditions can be very useful to identify new results, including the extreme point exactness for a QCQP problem that admits an inhomogeneous objective function with two homogeneous two-sided quadratic constraints and the convex hull exactness for fair SVD.
    Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning. (arXiv:2306.08590v1 [cs.LG])
    The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise"). While prior works that focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that large learning rate and small batch size do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating larger or more cost-effective gradient steps. Our work suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient flow algorithm. We provide evidence to support this hypothesis by conducting experiments that reduce SGD noise during training and by measuring the pointwise functional distance between models trained with varying SGD noise levels, but at equivalent loss values. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.
    Entropic Issues in Likelihood-Based OOD Detection. (arXiv:2109.10794v2 [stat.ML] UPDATED)
    Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold directly.
    Bayesian Fixed-Budget Best-Arm Identification. (arXiv:2211.08572v3 [cs.LG] UPDATED)
    Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
    Anticipatory Music Transformer. (arXiv:2306.08620v1 [cs.SD])
    We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.
    Understanding Optimization of Deep Learning. (arXiv:2306.09338v1 [cs.LG])
    This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively. We analyze these two challenges through several strategic measures, including the improvement of gradient flow and the imposition of constraints on a network's Lipschitz constant. To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization. Explicit optimization methods involve direct manipulation of optimizer parameters, including weight, gradient, learning rate, and weight decay. Implicit optimization methods, by contrast, focus on improving the overall landscape of a network by enhancing its modules, such as residual shortcuts, normalization methods, attention mechanisms, and activations. In this article, we provide an in-depth analysis of these two optimization classes and undertake a thorough examination of the Jacobian matrices and the Lipschitz constants of many widely used deep learning modules, highlighting existing issues as well as potential improvements. Moreover, we also conduct a series of analytical experiments to substantiate our theoretical discussions. This article does not aim to propose a new optimizer or network. Rather, our intention is to present a comprehensive understanding of optimization in deep learning. We hope that this article will assist readers in gaining a deeper insight in this field and encourages the development of more robust, efficient, and high-performing models.
    A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening. (arXiv:2306.08854v1 [cs.LG])
    Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov--Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening .
    MPSA-DenseNet: A novel deep learning model for English accent classification. (arXiv:2306.08798v1 [cs.CL])
    This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents.
    Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations. (arXiv:2211.17244v3 [cs.LG] UPDATED)
    Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
    MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting. (arXiv:2306.08777v1 [stat.ML])
    We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
    Exact Count of Boundary Pieces of ReLU Classifiers: Towards the Proper Complexity Measure for Classification. (arXiv:2306.08805v1 [stat.ML])
    Classic learning theory suggests that proper regularization is the key to good generalization and robustness. In classification, current training schemes only target the complexity of the classifier itself, which can be misleading and ineffective. Instead, we advocate directly measuring the complexity of the decision boundary. Existing literature is limited in this area with few well-established definitions of boundary complexity. As a proof of concept, we start by analyzing ReLU neural networks, whose boundary complexity can be conveniently characterized by the number of affine pieces. With the help of tropical geometry, we develop a novel method that can explicitly count the exact number of boundary pieces, and as a by-product, the exact number of total affine pieces. Numerical experiments are conducted and distinctive properties of our boundary complexity are uncovered. First, the boundary piece count appears largely independent of other measures, e.g., total piece count, and $l_2$ norm of weights, during the training process. Second, the boundary piece count is negatively correlated with robustness, where popular robust training techniques, e.g., adversarial training or random noise injection, are found to reduce the number of boundary pieces.
    Diffusion-based Conditional ECG Generation with Structured State Space Models. (arXiv:2301.08227v2 [eess.SP] UPDATED)
    Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
    Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders. (arXiv:2305.19259v2 [cs.LG] UPDATED)
    Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.
    Class-Conditional Conformal Prediction With Many Classes. (arXiv:2306.09335v1 [stat.ML])
    Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-chosen probability. In many classification problems, we would like to obtain a stronger guarantee -- that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. Existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction, which clusters together classes that have "similar" conformal scores and then performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
    Noise Stability Optimization for Flat Minima with Optimal Convergence Rates. (arXiv:2306.08553v1 [cs.LG])
    We consider finding flat, local minimizers by adding average weight perturbations. Given a nonconvex function $f: \mathbb{R}^d \rightarrow \mathbb{R}$ and a $d$-dimensional distribution $\mathcal{P}$ which is symmetric at zero, we perturb the weight of $f$ and define $F(W) = \mathbb{E}[f({W + U})]$, where $U$ is a random sample from $\mathcal{P}$. This injection induces regularization through the Hessian trace of $f$ for small, isotropic Gaussian perturbations. Thus, the weight-perturbed function biases to minimizers with low Hessian trace. Several prior works have studied settings related to this weight-perturbed function by designing algorithms to improve generalization. Still, convergence rates are not known for finding minima under the average perturbations of the function $F$. This paper considers an SGD-like algorithm that injects random noise before computing gradients while leveraging the symmetry of $\mathcal{P}$ to reduce variance. We then provide a rigorous analysis, showing matching upper and lower bounds of our algorithm for finding an approximate first-order stationary point of $F$ when the gradient of $f$ is Lipschitz-continuous. We empirically validate our algorithm for several image classification tasks with various architectures. Compared to sharpness-aware minimization, we note a 12.6% and 7.8% drop in the Hessian trace and top eigenvalue of the found minima, respectively, averaged over eight datasets. Ablation studies validate the benefit of the design of our algorithm.
    Extrapolated cross-validation for randomized ensembles. (arXiv:2302.13511v2 [stat.ME] UPDATED)
    Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.
    A Heavy-Tailed Algebra for Probabilistic Programming. (arXiv:2306.09262v1 [stat.ML])
    Despite the successes of probabilistic models based on passing noise through neural networks, recent work has identified that such methods often fail to capture tail behavior accurately, unless the tails of the base distribution are appropriately calibrated. To overcome this deficiency, we propose a systematic approach for analyzing the tails of random variables, and we illustrate how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler. To characterize how the tails change under various operations, we develop an algebra which acts on a three-parameter family of tail asymptotics and which is based on the generalized Gamma distribution. Our algebraic operations are closed under addition and multiplication; they are capable of distinguishing sub-Gaussians with differing scales; and they handle ratios sufficiently well to reproduce the tails of most important statistical distributions directly from their definitions. Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
    On Certified Generalization in Structured Prediction. (arXiv:2306.09112v1 [stat.ML])
    In structured prediction, target objects have rich internal structure which does not factorize into independent components and violates common i.i.d. assumptions. This challenge becomes apparent through the exponentially large output space in applications such as image segmentation or scene graph generation. We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of generalization scales not only with the number of structured examples but also with their size. The underlying assumption, conforming to ongoing research on generative models, is that data are generated by the Knothe-Rosenblatt rearrangement of a factorizing reference measure. This allows to explicitly distill the structure between random output variables into a Wasserstein dependency matrix. Our work makes a preliminary step towards leveraging powerful generative models to establish generalization bounds for discriminative downstream tasks in the challenging setting of structured prediction.
    Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression. (arXiv:2306.08432v1 [cs.LG])
    Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
    Predicting Real-time Crash Risks during Hurricane Evacuation Using Connected Vehicle Data. (arXiv:2306.08682v1 [stat.ML])
    Hurricane evacuation, ordered to save lives of people of coastal regions, generates high traffic demand with increased crash risk. To mitigate such risk, transportation agencies need to anticipate highway locations with high crash risks to deploy appropriate countermeasures. With ubiquitous sensors and communication technologies, it is now possible to retrieve micro-level vehicular data containing individual vehicle trajectory and speed information. Such high-resolution vehicle data, potentially available in real time, can be used to assess prevailing traffic safety conditions. Using vehicle speed and acceleration profiles, potential crash risks can be predicted in real time. Previous studies on real-time crash risk prediction mainly used data from infrastructure-based sensors which may not cover many road segments. In this paper, we present methods to determine potential crash risks during hurricane evacuation from an emerging alternative data source known as connected vehicle data. Such data contain vehicle location, speed, and acceleration information collected at a very high frequency (less than 30 seconds). To predict potential crash risks, we utilized a dataset collected during the evacuation period of Hurricane Ida on Interstate-10 (I-10) in the state of Louisiana. Multiple machine learning models were trained considering weather features and different traffic characteristics extracted from the connected vehicle data in 5-minute intervals. The results indicate that the Gaussian Process Boosting (GPBoost) and Extreme Gradient Boosting (XGBoost) models perform better (recall = 0.91) than other models. The real-time connected vehicle data for crash risks assessment will allow traffic managers to efficiently utilize resources to proactively take safety measures.
    Integrating Uncertainty Awareness into Conformalized Quantile Regression. (arXiv:2306.08693v1 [stat.ME])
    Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, as we demonstrate empirically, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction intervals of CQR do not distinguish between two forms of uncertainty: first, the variability of the conditional distribution of $Y$ given $X$ (i.e., aleatoric uncertainty), and second, our uncertainty in estimating this conditional distribution (i.e., epistemic uncertainty). This can lead to uneven coverage, with intervals that are overly wide (or overly narrow) in regions where epistemic uncertainty is low (or high). To address this, we propose a new variant of the CQR methodology, Uncertainty-Aware CQR (UACQR), that explicitly separates these two sources of uncertainty to adjust quantile regressors differentially across the feature space. Compared to CQR, our methods enjoy the same distribution-free theoretical guarantees for coverage properties, while demonstrating in our experiments stronger conditional coverage in simulated settings and tighter intervals on a range of real-world data sets.
    Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses. (arXiv:2106.09779v8 [cs.LG] UPDATED)
    This paper studies federated learning (FL)--especially cross-silo FL--with data from people who do not trust the server or other silos. In this setting, each silo (e.g. hospital) has data from different people (e.g. patients) and must maintain the privacy of each person's data (e.g. medical record), even if the server or other silos act as adversarial eavesdroppers. This requirement motivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP), which requires silo i's communications to satisfy record/item-level differential privacy (DP). ISRL-DP ensures that the data of each person (e.g. patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is different from well-studied privacy notions. Central and user-level DP assume that people trust the server/other silos. On the other end of the spectrum, local DP assumes that people do not trust anyone at all (even their own silo). Sitting between central and local DP, ISRL-DP makes the realistic assumption (in cross-silo FL) that people trust their own silo, but not the server or other silos. In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. Remarkably, we show that similar bounds are attainable for smooth losses with arbitrary heterogeneous silo data distributions, via an accelerated ISRL-DP algorithm. We also provide tight upper and lower bounds for ISRL-DP federated empirical risk minimization, and use acceleration to attain the optimal bounds in fewer rounds of communication than the state-of-the-art. Finally, with a secure "shuffler" to anonymize silo messages (but without a trusted server), our algorithm attains the optimal central DP rates under more practical trust assumptions. Numerical experiments show favorable privacy-accuracy tradeoffs for our algorithm in classification and regression tasks.
    Multi-class Graph Clustering via Approximated Effective $p$-Resistance. (arXiv:2306.08617v1 [cs.LG])
    This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
    Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains. (arXiv:2306.09332v1 [cs.DS])
    Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and a generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019.
    Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training. (arXiv:2306.08173v1 [cs.LG])
    The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose sensitive information necessitates the integration of privacy-preserving mechanisms. We introduce a differentially private adaptation of the Contrastive Language-Image Pretraining (CLIP) model that effectively addresses privacy concerns while retaining accuracy. Our proposed method, Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diverse vision-and-language tasks such as image classification and visual question answering. We demonstrate that our approach retains performance on par with the standard non-private CLIP model. Furthermore, we analyze our proposed algorithm under linear representation settings. We derive the convergence rate of our algorithm and show a trade-off between utility and privacy when gradients are clipped per-batch and the loss function does not satisfy smoothness conditions assumed in the literature for the analysis of DP-SGD.  ( 2 min )
    Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources. (arXiv:2306.08364v1 [stat.ML])
    Existing theoretical studies on offline reinforcement learning (RL) mostly consider a dataset sampled directly from the target task. In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. An information-theoretic lower bound is derived, which reveals a necessary requirement on the number of involved sources in addition to that on the number of data samples. Then, a novel HetPEVI algorithm is proposed, which simultaneously considers the sample uncertainties from a finite number of data samples per data source and the source uncertainties due to a finite number of available data sources. Theoretical analyses demonstrate that HetPEVI can solve the target task as long as the data sources collectively provide a good data coverage. Moreover, HetPEVI is demonstrated to be optimal up to a polynomial factor of the horizon length. Finally, the study is extended to offline Markov games and offline robust RL, which demonstrates the generality of the proposed designs and theoretical analyses.  ( 2 min )
    Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD. (arXiv:2306.08125v1 [stat.ML])
    Neural network compression has been an increasingly important subject, due to its practical implications in terms of reducing the computational requirements and its theoretical implications, as there is an explicit connection between compressibility and the generalization error. Recent studies have shown that the choice of the hyperparameters of stochastic gradient descent (SGD) can have an effect on the compressibility of the learned parameter vector. Even though these results have shed some light on the role of the training dynamics over compressibility, they relied on unverifiable assumptions and the resulting theory does not provide a practical guideline due to its implicitness. In this study, we propose a simple modification for SGD, such that the outputs of the algorithm will be provably compressible without making any nontrivial assumptions. We consider a one-hidden-layer neural network trained with SGD and we inject additive heavy-tailed noise to the iterates at each iteration. We then show that, for any compression rate, there exists a level of overparametrization (i.e., the number of hidden units), such that the output of the algorithm will be compressible with high probability. To achieve this result, we make two main technical contributions: (i) we build on a recent study on stochastic analysis and prove a 'propagation of chaos' result with improved rates for a class of heavy-tailed stochastic differential equations, and (ii) we derive strong-error estimates for their Euler discretization. We finally illustrate our approach on experiments, where the results suggest that the proposed approach achieves compressibility with a slight compromise from the training and test error.  ( 3 min )
    Nonparametric regression using over-parameterized shallow ReLU neural networks. (arXiv:2306.08321v1 [stat.ML])
    It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.  ( 2 min )
    Differentially Private Wireless Federated Learning Using Orthogonal Sequences. (arXiv:2306.08280v1 [cs.IT])
    We propose a novel privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the communication design perspective, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS can offer both item-level and client-level differential privacy (DP) guarantees. Moreover, by adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A novel FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between convergence rate and differential privacy levels. Numerical results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that our analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels.  ( 2 min )
    Unbiased Learning of Deep Generative Models with Structured Discrete Representations. (arXiv:2306.08230v1 [cs.LG])
    By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.  ( 2 min )
    Bayesian Non-linear Latent Variable Modeling via Random Fourier Features. (arXiv:2306.08352v1 [stat.ML])
    The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.  ( 2 min )
    Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression. (arXiv:2306.08320v1 [cs.LG])
    The trade-off between regret and computational cost is a fundamental problem for online kernel regression, and previous algorithms worked on the trade-off can not keep optimal regret bounds at a sublinear computational complexity. In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. Both algorithms dynamically maintain a group of nearly orthogonal basis used to approximate the kernel mapping, and keep nearly optimal regret bounds by controlling the approximate error. The number of basis depends on the approximate error and the decay rate of eigenvalues of the kernel matrix. If the eigenvalues decay exponentially, then AOGD-ALD and NONS-ALD separately achieves a regret of $O(\sqrt{L(f)})$ and $O(\mathrm{d}_{\mathrm{eff}}(\mu)\ln{T})$ at a computational complexity in $O(\ln^2{T})$. If the eigenvalues decay polynomially with degree $p\geq 1$, then our algorithms keep the same regret bounds at a computational complexity in $o(T)$ in the case of $p>4$ and $p\geq 10$, respectively. $L(f)$ is the cumulative losses of $f$ and $\mathrm{d}_{\mathrm{eff}}(\mu)$ is the effective dimension of the problem. The two regret bounds are nearly optimal and are not comparable.  ( 2 min )

  • Open

    How to install Bark with voice cloning locally?
    I want to install bark with voice cloning locally and I do not find any installation help with this. Can someone please provide step by step instructions for this? I tried the collab but that is very limited, and I can't get that to work properly, and it'd be way easier if I just set it up locally. I do not know of any scripts for local installs either so if someone could point me to a python script to use a custom voice with bark and run the TTS that'd be great too. submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
    Philippines: Tech Savvy Kids use AI to catch a thief caught on video
    Videos and pics taken from a facebook post. Yesterday, in my hometown of Bacolod City, Philippines, a man stole a bag from two performing kids on stage. The theft was caught on video by one of the venue's security cameras. Commentators on social media used artificial intelligence to sharpen the image of the thief, and they sent it to the kids, who then gave it to the police. The authorities were able to recover the bag, but one cellphone was missing. The suspect is identified but still at large. ​ https://i.redd.it/v80i1p3dln6b1.gif Link to original post and video https://www.facebook.com/lexchavezofficial/videos/1307441943456719/ ​ AI sharpened image ​ Ai sharpened image The update - translated as best I could and summarized. Thank you to all who helped and shared. We tried to secure our belongings but still bad things happened. Our bag was stolen with 2 cellphones, wallet, cameras and microphone. The police helped us recover the bag but one cellphone was missing and the money missing from the wallet. https://www.facebook.com/lexchavezofficial/posts/pfbid02hQGmL8Q36ZVHPXXNkw3W26RFTuzeaGCNzLTBA5nQpqx9r1q9uS1itWeBUGFHYt2Fl submitted by /u/mrlechon [link] [comments]  ( 8 min )
    Would everyone use AI differently if your phone/computer was the power used to complete the task asked of it..?
    As above, so below... submitted by /u/Slugity [link] [comments]  ( 8 min )
    How to setup GPT Engineer to run from command line (quick tutorial)
    submitted by /u/mutantdustbunny [link] [comments]  ( 8 min )
    With voice cloning, is it possible to make the voice sound like a natural speaker in a different language?
    I'm looking into transcripting audio and having the speaker be voice cloned so i can have it speak in a different language. My question is does it speak unnatural, and what are the best programs available to use free or otherwise? submitted by /u/MrSneaky64 [link] [comments]  ( 8 min )
    I made a sci-fi short film about the dangers of AI using AI generated video
    submitted by /u/bopili [link] [comments]  ( 8 min )
    Day 8: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/kx64m56ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=0282bd3062215ba220009ee76ccfcc7f0f8b9d04 https://preview.redd.it/yt8f126ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=22c20a4d7f6574fb8bb6ee0146eb3f254367d16c https://preview.redd.it/4groo16ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=46087087557e8e2ed7148e13c6dc12a06a10dcfb https://preview.redd.it/okm8jn6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=bd36ba5923e937aa2a209d1af825fae00e9548ae https://preview.redd.it/5pepmq6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=3b20d02e52bbdb988309184fa40d65bc075b7d4e https://preview.redd.it/arj38r6ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=db6ef0bd49536761c6d384e9b63084d772beaaa4 https://preview.redd.it/1127376ntm6b1.jpg?width=1024&format=pjpg&auto=webp&s=735898d6eeb65726129cfed370d9ab0782e1fac4 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 8: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/yhi8x4egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=4aa067fa7681fe50a2a877dfef1bf5fa30d88e32 https://preview.redd.it/3316wjdgtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=990b9973bdd7777def3f044e5ed933040ce6320b https://preview.redd.it/0o7mv5egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=96110417a2c4b42cfae2b744957d64be40ebc7b8 https://preview.redd.it/p8t0d6egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=22e1914b32329a020ab389b7c20fa55597c9bbe1 https://preview.redd.it/es1uu6egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=56bf96dc3109a8c7c7b0ea477bb230dcaac966ac https://preview.redd.it/ldo3j8egtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=6521986816015c3dba265f7f6a08442620143b8a https://preview.redd.it/9kk6vodgtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=d432b048c874d4fe2110efb1656678888742b958 https://preview.redd.it/3mazuaegtm6b1.jpg?width=1024&format=pjpg&auto=webp&s=9e8f54868ce7830532c189b41d92169ff5ec7ee5 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    AI for food insecurity and affordable healthy diets.
    Hey everyone! I’m giving a presentation to my work about the potential of AI and what how it could be helpful to certain communities. Most of the tools I’ve noticed are super helpful when it comes to office work or productivity, but I wanted to ask if y’all think it’s possible for an AI model to be fine-tuned towards food recipes and recommend meals and snacks based off of the users preference. Like it asks what my monthly budget is, caloric requirement, provides prices from available grocery sites, allergies, and what the user owns or doesn’t own already. I just know from experience that I didn’t really know of calorie dense foods in undergrad and could have been eating healthier while also getting the necessary energy and saving money. Do you think AI is at that point yet? submitted by /u/Content_Test_8910 [link] [comments]  ( 8 min )
    Anyone know what AI tool he used to make Spider Man Miles Morales look like this?
    submitted by /u/Evannnnnm [link] [comments]  ( 8 min )
    ‘Like science fiction,’ Seattle startup sends laser-equipped robots to zap weeds on farmland
    submitted by /u/JohnnyMnemo [link] [comments]  ( 8 min )
    Descript alternative for voice cloning a video game character?
    Just been learning Descript and spent hours cutting up video game footage to get 10 minutes of a character talking to test out voice cloning to make the character say different things. When I submit training data, there's a disclaimer about only using your own voice. I didn't realise this but I understand it makes sense. My question is, are there any other alternatives? For example, how are these AI songs getting put out there? submitted by /u/SupremeFlamer [link] [comments]  ( 8 min )
    AI tool for video resolution upscale?
    Does anyone know about any video resolution upscale AI, there is the "nightmareai" for upscaling photos and im wondering if something like that exists for videos aswell. submitted by /u/ZipoxD [link] [comments]  ( 8 min )
    I've Been Working On Something Exciting and I Want To Ask For Your Feedback
    So Imagine This Scenerio: You Upload a Picture of Your FACE, and Create an AI DIGITAL HUMAN of Yourself I was thinking about new ideas to add on my Chat with PDF tool. An idea struck me: whouldn't it be awesome if you can create an AI digital human of yourself and ask questions from your AI twin instead of faceless ChatGPT?💡Imagine being able to ask questions to your own AI twin that gives you answer from your uploaded PDFs. This idea looked interesting to me so I have started working on it. https://preview.redd.it/wxby7wqa8l6b1.jpg?width=955&format=pjpg&auto=webp&s=a5f6f2e73d29d130ad1609777e3c44c4ea779521 I'm eager to hear your thoughts on this new feature! Will it succeed or fail? submitted by /u/Brian-Hose225 [link] [comments]  ( 8 min )
    Is OpenAI's finetunable models what I need to create a tech support tool?
    Hello! I would like to train OpenAI's Davinci model on a company's knowledge base, which consists of a few hundred articles and transcribed videos. I want to use this model as a chat support helper tool that understands the user's questions and feeds me intelligent and factually correct replies. I provide tech support and would like to make my work easier. Their documentation on fine-tuning makes it seems like it's more about learning entirely new skills than just adding more information. It recommends using multiple examples and the case study of a customer support chatbot uses a different structure from what I need: {"prompt":"Summary: \n\nSpecific information:\n\n###\n\nCustomer: \nAgent: \nCustomer: \nAgent:", "completion":" \n"} {"prompt":"Summary: \n\nSpecific information:\n\n###\n\nCustomer: \nAgent: \nCustomer: \nAgent: \nCustomer: \nAgent:", "completion":" \n"} Are their finetunable models what I'm looking for or should I be using something else? What prompt structure should I use if I want it to feed answers to questions like ChatGPT would? Thanks a lot! submitted by /u/JoZeHgS [link] [comments]  ( 8 min )
    An optimistic outlook on AI
    Now I’d like to start off by saying apart from a surface level understand of computers, technology and Artificial intelligence I am not qualified to speak on this subject at all but I feel like most AI focussed subs (especially r/singularity) have a very pessimistic and dystopian prediction on the future of this technology and I like to throw some ideas out in the other direction because I’m interested to see if anyone else has some polarising views. Now I would describe myself as a creative person, so that’s where most of my thoughts stem from, for example I can imagine a future where AI generated media makes it possible for anyone to make anything, which will of course be terrible at first. I can definitely imagine someone shitposting an entire movie or something, but once the intial no…  ( 9 min )
    Generative fill
    Are there any other free generative fill AI's similar to Photoshops and Dall-E's Outpanting? submitted by /u/Boatetye [link] [comments]  ( 8 min )
    Best free voice cloning?
    I really want to clone some voices and the only good one is ElevenLabs but that is paid. I tried out Tortoise but that was pretty bad. I am looking for a ElevenLabs competitor that is free/freemium. submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
  • Open

    47.9% Robust Test Error on CIFAR10 with Adversarial Training and PyTorch
    Knowing how to compute adversarial examples from this previous article, it would be ideal to train models for which such adversarial examples do not exist. This is the goal of developing adversarially robust training procedures. In this article, I want to describe a particularly popular approach called adversarial training. The idea is to train on adversarial examples computed during training on-the-fly. I will also discuss a PyTorch implementation that obtains 47.9% robust test error — 52.1% robust accuracy — on CIFAR10 using a WRN-28-10 architecture. The post 47.9% Robust Test Error on CIFAR10 with Adversarial Training and PyTorch appeared first on David Stutz.  ( 10 min )
  • Open

    Set of orbits with the same average distance to sun
    Suppose a planet is in an elliptical orbit around the sun with semimajor axis a and  semiminor axis b. Then the average distance of the planet to the sun over time equals a(1 + e²/2) where the eccentricity e satisfies e² = 1 − b²/a². You can find a proof of this statement in [1]. […] Set of orbits with the same average distance to sun first appeared on John D. Cook.  ( 5 min )
  • Open

    The emoji of the future
    Some of the recent image-generating models have this thing where they can fill in the blank parts of images. It's handy when you want to show them exactly how to give you more of the same. Like these animal emoji. See if you can tell which ones I  ( 3 min )
    Bonus: More extrapolated emoji
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Could someone make colab of this awesome repo for Blind super resolution?
    GRL-Image-Restoration ​ tried it on video files through the holywu vapoursynth port and results are awesome. Could someone make colab of this repo for Blind super resolution?or write clear instructions how to setup and run inference in anaconda environment? Thank you very, very much in advance. ​ submitted by /u/zelenooki87 [link] [comments]  ( 8 min )
    How Activation Function works in Non-Linear Data ?
    I am having trouble understanding how Activation function works helps in fitting non linear function? Here is the visualization of two neurons with ReLu https://pasteboard.co/LRbI8pjVlSRC.jpg https://www.desmos.com/calculator/y4jsrd2hms submitted by /u/god00speed [link] [comments]  ( 8 min )
  • Open

    Question about Transformer model input in RL
    Hello there! I'm currently trying to implement the GTrXL model with PPO, and I have a question regarding the correct input format for the model. After going through some of the source code, I noticed that the forward input is expected to have the shape `(batch size, 1, state size)` , where 1 represents the sequence length. I'm curious to know if this input format is the correct one for the model to function properly. Cause I thought that in order for transformer model to work, the sequence length must at least have to be an or few episodes length. Thanks! ​ submitted by /u/dinhdat1110 [link] [comments]  ( 8 min )

  • Open

    One-Minute Daily AI News 6/16/2023
    Chinese tech giants ByteDance (TikTok), Tencent, Baidu, and Alibaba simply can’t get enough of Nvidia’s High-Performance Computing (HPC) products. This news comes from Chinese media, which reports that TikTok creator, ByteDance, alone, has already caught up (in pure dollar terms) to what the entire Chinese market ordered from Nvidia in 2022.[1] Chinese President Xi Jinping discussed the global rise of artificial intelligence with Bill Gates on Friday and said he welcomed U.S. firms including Microsoft bringing their AI tech to China.[2] Today, Meta has announced its latest generative AI model, following on the back of ImageBind is Voicebox, which is designed to help creators with its ability to perform speech generation tasks such as audio editing, sampling, and stylizing, even if it wasn’t specifically trained to do so through in-context learning.[3] AI-generated ‘Family Guy’ livestream banned after making a bomb threat.[4] Sources: [1] https://www.tomshardware.com/news/chinas-bytedance-has-gobbled-up-dollar1-billion-of-nvidia-gpus-for-ai-this-year [2] https://www.reuters.com/technology/chinas-xi-tells-bill-gates-he-welcomes-us-ai-tech-china-2023-06-16/ [3] https://www.neowin.net/news/meta-announces-voicebox-its-generative-ai-model-for-audio/ [4] https://www.nme.com/news/tv/ai-generated-family-guy-livestream-banned-after-making-a-bomb-threat-3457051 submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    I think everyone should watch this video
    While i think AI advancements are crazy i think this video puts a lot in perspective. Just casually browsing I think we are echo chambering how AI is about to take over etc. While we definitely should look at the future and maybe talk about putting protections in place. While we are closer than ever we are still no where near real general AI without another massive leap. https://www.youtube.com/watch?v=l7tWoPk25yU The video talks about the issues they found with the AI Go bot that was "unbeatable" but actually didnt even understand the most basic levels of the game even though it was destroying grand masters. submitted by /u/phatrequiem [link] [comments]  ( 8 min )
    What’s the best ai to generate accurate hands and feet???
    I have been looking around and can’t find anything that looks “human” submitted by /u/ExaminationNo6235 [link] [comments]  ( 8 min )
    Day 7: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/esvzoywjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=d8377cc8d55a929661e1bfa4f1364ffb699898f1 https://preview.redd.it/cd621lxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=5a5a14b7155401713381a47c6e9e118fe1dbfbcf https://preview.redd.it/jl9yozwjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=e10b025891c22af7ad1d111aedb0d16f4769cfd5 https://preview.redd.it/llpz5mxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=f0b0d94317df71f341f7c67de9a3c4ddd4d264bb https://preview.redd.it/2065vlxjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=71a00e14946118ccbef4e7d4211985c3f8ebdcf6 https://preview.redd.it/zrj75txjrf6b1.jpg?width=1024&format=pjpg&auto=webp&s=7685fbd05bca9ec8946ab0e2b7c83125bf1dfe0b submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Day 7: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/qy3lvvxarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=bbdd1c1748ae2bb9da79248a680006cd544688d4 https://preview.redd.it/747lcryarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=3a82e42dd63b4913ae21d61c7922ea131bd619f8 https://preview.redd.it/d9wulxxarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=0c4a9f1e7e73353c52ac95807a8563971b32fb89 https://preview.redd.it/o6m9sjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=e672894dc8525f9b0d0107f5015ee0b3a04e74e7 https://preview.redd.it/k37gqjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=fcd42a2231a41018b217eb55546efc1ba7d11f27 https://preview.redd.it/q6gvt5zarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=1b5d1c1067ea10986e1f12c07f169f147dcb449e https://preview.redd.it/xfc2kjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=af09ef9c7302230f63a370292c05e5074c648816 https://preview.redd.it/onflyjyarf6b1.jpg?width=1024&format=pjpg&auto=webp&s=19f1c34218528d145bba5dc2bc0e7219074b0955 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Will AI replace Accounting Jobs?
    Sorry if you get plenty of these questions. But I am looking into going off to University/College and one of my options is Accounting, however. I'm concerned that with the rise of AI the past few months. That this job field might soon be gone. Am I just needlessly worrying? submitted by /u/Snow_Mexican1 [link] [comments]  ( 8 min )
    How can one test this hypothesis using data and AI that "people like to eat what they had in parties as kids"
    Basically the hypothesis is that we tend to like food that we had as a group when we were kids. For example, the birthday party cake, candies and ice-cream. Juice and cup cakes are all party foods. If the party foods of kids change for healthier like naturally colored salads and vegetables with multi-grain bread, then the actual taste of people will also change. submitted by /u/holihai [link] [comments]  ( 8 min )
    I'm looking for an AI tool that would create a basic financial forecast. Is there anything out there?
    submitted by /u/Dalembert [link] [comments]  ( 8 min )
    AI characters, without any content filtering (TruePerson AI)
    TruePerson AI, chat with Uncensored AI Characters Because of the many content restrictions of AI in general, we made an AI chat app that is able to bypass content limitations. The AI in itself is based upon GPT. However, it's been modified to be capable of generating any content and be as neutral as possible. On the app, you can choose among hundreds of AI characters and even create your own. For now, the results are just fine, but we are eager to hear more about your experience. Feel free to exploit it and push its limits ! 🤖 Download on Play Store 🍏 Download on App Store 💜 Join our community Wanna push the experience further ? App is free to use, but premium offers are available that let you chat without having to worry about credits. submitted by /u/Fayerdd [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - partnered with aibrews.com feel free to follow their newsletter News & Insights ElevenLabs has launched AI Speech Classifier - an authentication tool that lets you upload any audio sample to identify if it contains ElevenLabs AI-generated audio [Details]. Nvidia Research presents SceneScape - a method to generate long-term walkthroughs in imaginary scenes just from an input text prompt [Details |Paper ]. Meta AI introduces the Image Joint Embedding Predictive Architecture (I-JEPA), a new AI model which learns from the world like humans and excels in computer vision tasks, while being more computationally efficient. It learns by creating an internal model of the outside world, which compares abstract representations of images (rather than comparing the pixels themsel…  ( 10 min )
    The AI singularity is nearer than we think?
    Hey folks, I’ve been mulling over something. Every other day I see some article about more progress in AI, and it just doesn’t seem to stop. Just look at the AI models today, they're getting massive, and their capabilities are already borderline scary. How much longer before the lines between AI and human blur, or even become indistinguishable? Take gpt-4 or Anthropic’s Claude, for instance, they're so much better at understanding and even generating human-like text. Not to mention, these models are conditioned to avoid certain topics and tasks. I don’t think people realize that AutoGPT doesn’t work well because OpenAI and Anthropic have trained their AI’s not to be autonomous. Could we possibly already be in the AI singularity? submitted by /u/Emotional_Ratio_3251 [link] [comments]  ( 8 min )
    IBM Research: The 100,000 Qubit Quantum-Centric Supercomputer of 2033
    submitted by /u/linebell [link] [comments]  ( 8 min )
    [Question/Discussion] Ideas on Whisper AI hallucinating
    I used Whisper AI (openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision (github.com)) to transcribe the Common Voice data set (Common Voice (mozilla.org)) for a language and note that the 'tiny' model hallucinates a lot, whereas the bigger 'small' model almost does not hallucinate at all, and the even bigger 'base' model hallucinates more than the 'small' model. Furthermore, the general performance of the small model is better than both the tiny and base models. As a side note, the data instances in this data set are sentences worth about 5-10 seconds of audio. I am mostly interested in your thoughts on why a larger model does not necessarily perform better and may hallucinate more. I did not change any of the temperature or other settings when transcribing. I can imagine a larger model might overfit which can cause this phenomenon but I would like to know what you guys think might be the cause of the lower performance with more hallucinations. As context: I am doing research for my master thesis so any ideas are welcome! submitted by /u/Rikker_33 [link] [comments]  ( 8 min )
    Using chatGPT as a teacher
    Hello everyone, I'm looking into the impact of chatGPT as a teaching aid for educators. Please let me know how you're leveraging ChatGPT or other AI tools to reduce workload or save yourself from rote work. Also, what tasks do you use it frequently for and are there any challenges? If willing to help leave your answers here https://forms.gle/3SRRToFtVDCaGMyv9 submitted by /u/Winter-Mud9223 [link] [comments]  ( 8 min )
    Custom-AI App
    I need your expertise! I’m working on creating sets of Earth Science practice questions with multiple-choice answers for high school students. Some of these sets require images/data tables/graphs that students will reference to answer the question. Essential Requirements: Questions should test the application of knowledge, not just recall. Question difficulty and format must be consistent. Answer options should be plausible, distinct, and similar in length and detail. Here's an example: INSTEAD OF "How do scientists calculate the age of a star?" (this is a content recall question) → "Scientists calculate the age of a star by using Hertzsprung-Russell diagrams. These diagrams have the temperature of stars plotted against their brightness. Using the following graph (insert image), approximate the age of a star that is x degrees in temperature and x in brightness." I’ve tried using Chat GPT-4, however I’m finding that it doesn’t remember the criteria for quality questions and frequently will give questions that rely upon recall. Additionally, Chat-GPT cannot generate images or graphs that I might use as stimuli to accompany the questions. I have basic programming knowledge and I’m wondering if it's possible to create an automated tool to generate these questions and answers, and incorporate relevant images. Which platforms or programming languages would be best to use for this task? submitted by /u/guppyguyco [link] [comments]  ( 8 min )
    What’s the best free AI text to speech that has unlimited characters
    I run a YouTube Channel where i use text to speech and I can’t seem to find a good free platform with unlimited characters. submitted by /u/Disastrous_Loan753 [link] [comments]  ( 8 min )
    I am a new photographer and videographer and would like to dive deep into AI as a tool to help me in my career. Is there any courses that could pheraps give me qualifications but also just for knowledge
    I have discrete skills with Adobe Lightroom and Premiere but I never touched Photoshop and would like to learn it. I'm also interested in AI helping with idea concepts, hashtags generation and the infinite possibilities this new technology can bring. AI is undeniably the future and I wish to grow with it. submitted by /u/Spaktor [link] [comments]  ( 8 min )
    If Elvis Presley had been a woman [2048 x 2047]
    submitted by /u/Arcapelian [link] [comments]  ( 8 min )
    Document Organizer
    Hi everyone, I'm looking for a tool that can help me with some organization of my documents. I have a lot of companies that I keep track of, and different associated documentation for those. What I would like is a tool that can automatically tag documents with relevant information (i.e. companyName, taxReturn, capitalizationTable). I would like to structure the information similar to something like Obsidian or Notion.ai. After the organization, I would like to be able to query the information (i.e. What was the revenue of companies based out of The Netherlands make in 2022?) although I know this is a bit more difficult. Does something like this exist? submitted by /u/twigssc [link] [comments]  ( 8 min )
    KubulaBot Your Chat Companion (ChatGPT-based Android App)
    submitted by /u/djquimoso [link] [comments]  ( 8 min )
    Game Devs on how AI will Change Gaming Forever
    submitted by /u/GuyTDraker [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/15/2023
    AI-powered robots are giving eyelash extensions. It’s cheaper and quicker. LUUM, a beauty studio in Oakland, Calif., uses robots to give clients false eyelash extensions using AI technology.[1] German automaker Mercedes-Benz announced Thursday that it will add OpenAI’s ChatGPT chatbot to its cars via a beta program for the Mercedes-Benz User Experience (MBUX) feature in its vehicles, enabling AI-driven voice commands and additional functionality.[2] AI will be used in southwest England to predict pollution before it happens and help prevent it. It’s hoped the pilot project in Devon will help improve water quality at the seaside resort of Combe Martin, making it a better place for swimming.[3] Freshworks CEO Girish Mathrubootham joins Caroline Hyde and Ed Ludlow to discuss how the company’s latest products are leveraging generative AI, why it is important to democratize access to the power of AI, and why India is a force to look out for in AI innovation.[4] Sources: [1] https://www.washingtonpost.com/technology/2023/06/10/ai-technology-eyelash-extensions/ [2] https://decrypt.co/144872/mercedes-benz-adding-chatgpt-cars-ai-voice-commands [3] https://www.bbc.com/news/science-environment-65913940 [4] https://www.bloomberg.com/news/videos/2023-06-15/freshworks-ceo-ai-will-be-great-opportunity-for-india-video submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Will AI be able to help model human behavior and uncover hidden truths, or confirm theories about ancient civilizations?
    We have a lot of data, say, for ancient Rome. Will we someday be able to create models of human behavior, military strategy, political maneuvering, etc. that allows us to see into history, uncover truths about the past, or confirm theories about ancient civilizations? submitted by /u/roundearthervaxxer [link] [comments]  ( 8 min )
    AI algorithms find drugs that could combat ageing
    submitted by /u/ImmortalWolly [link] [comments]  ( 8 min )
  • Open

    Voicebox From Meta AI Gonna Change Voice Generation & Editing Forever - Can Eliminate ElevenLabs
    submitted by /u/CeFurkan [link] [comments]  ( 8 min )
    Training slow down for batch size > 1. A single operation is the issue. Help.
    Note: I am using an up to date version of Pytorch and running on GPU. When I train my model with batch size 1, it goes as expected. When I use batch size 2 (still fits in GPU) there is one specific ConvolutionBackward0 operation that suddenly takes WAAAAAY longer (I mean 10 times...). No other backward operations are affected. Does anyone and suggestions as to way this could be happening? It seems extremely counter intuitive that only one piece of the backward pass breaks. submitted by /u/teduck1 [link] [comments]  ( 8 min )
    [HELP] Train
    hey guys I want to fine tune a Davinci 003 model to write in a specific writing style. Problem is I don't understand how to create the dataset for this. lets say I want my model to write articles in a specific structure What I need to put as Prompt and Compilation in the dataset? ​ let's say I want it to write articles in a very specific structure. for an example: article introduction section short explanation of the main topic: Every jeep has their own specific codes which are used to notify you that something is wrong. For example, DTC codes like c123f or u1110 etc. main keyword as a question: So, what does C123F code Jeep mean? short direct answer to the question: Jeep code C123F may come up due to a damaged steering column or intermediate shaft. It can be also be due to incorrect positioning of steering angle sensor or improper clock spring installation. Resetting the ECU, properly installing the clockspring or replacing the clock spring can fix the issue. That was just the preview. If you want more details in this matter, keep reading this article. how to train the model to write intros like above? submitted by /u/buxrmp [link] [comments]  ( 8 min )
    Two things you need to know about "Tracking Everything Everywhere All at Once" paper
    ​ https://preview.redd.it/4yqq93h07d6b1.jpg?width=2800&format=pjpg&auto=webp&s=2ece3143ae3aac3c93468922ac5c2bc2099a59f0 The guys from OpenCV.ai write some interesting thoughts about the paper "Tracking Everything Everywhere All at Once". They highlight two interesting things: This algorithm is intended to work on the whole video all at once. It runs an optimization process given all video frames, so it is not designed for real-time tracking. It needs to run an external algorithm for supervision to perform the track optimization process. More details here If you have not read the paper submitted by /u/No-Independence5880 [link] [comments]  ( 8 min )
  • Open

    Improving Subseasonal Forecasting with Machine Learning
    This content was previously published by Nature Portfolio and Springer Nature Communities on Nature Portfolio Earth and Environment Community. Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and to government agencies from the local to the national level. Weather forecasts zero to ten days ahead and […] The post Improving Subseasonal Forecasting with Machine Learning appeared first on Microsoft Research.  ( 11 min )
  • Open

    Envisioning the future of computing
    MIT students share ideas, aspirations, and vision for how advances in computing stand to transform society in a competition hosted by the Social and Ethical Responsibilities of Computing.  ( 9 min )
  • Open

    Enhancing patient care with AI-powered health monitoring and remote patient management
    Healthcare has undergone consistent changes. From surgeries that were conducted by sedating patients with opium and amputation of an infected part, the technology has evolved to anesthesia and new & innovative ways of treating bacterial infection. We can now conduct LASIK surgeries and have even healed conjoined twins from the head. With so much advancement,… Read More »Enhancing patient care with AI-powered health monitoring and remote patient management The post Enhancing patient care with AI-powered health monitoring and remote patient management appeared first on Data Science Central.  ( 22 min )
  • Open

    Quantifying Signal-to-Noise Ratio in High Variance, Low Reward Improvement Environments
    I am dealing with a class of environments / reward function that only allow a very slight improvement of the mean rewards but have relatively high variance. So basically the distribution of rewards is very wide but shifts only slightly over the course of the training. I know this because I have a pretty good MPC policy that I can run on the environment and see the improvement over the initial policy versus the variance of the rewards. The smaller the ratio of possible reward improvement over the variance of rewards, the harder it becomes for the RL algorithm to learn a good policy. Which is plausible. Here is an example calculation for an environment. I had a hard time getting it to converge and the problem is super sensitive to the hyperparameter settings. ​ Rewards for the initial policy and the MPC policy, the standard deviation of rewards and the ratio of possible improvement over std(rewards). My question would be, is there an agreed way to quantify this signal-to-noise ratio or ratio of possible improvement? And is there literature investigating this problem or do you have any experience what would be a 'good' ratio? submitted by /u/flxh13 [link] [comments]  ( 8 min )
  • Open

    SambaSafety automates custom R workload, improving driver safety with Amazon SageMaker and AWS Step Functions
    At SambaSafety, their mission is to promote safer communities by reducing risk through data insights. Since 1998, SambaSafety has been the leading North American provider of cloud–based mobility risk management software for organizations with commercial and non–commercial drivers. SambaSafety serves more than 15,000 global employers and insurance carriers with driver risk and compliance monitoring, online […]  ( 6 min )

  • Open

    Nikocado Avocado YouTube video by an AI
    [Opening shot of a hospital room, with Nikocado Avocado lying in bed, wearing a hospital gown, and surrounded by food containers from McDonald's.] Host (Excitedly): "Hey, everyone! Welcome back to our channel, where we bring you the latest updates from the world of mukbang! Today, we have an unbelievable story to share with you. Our favorite mukbanger, Nikocado Avocado, is celebrating his heart attack in the most unexpected way!" [Cut to a montage of Nikocado Avocado's previous mukbang videos, showing his extravagant food feasts.] Host: "Nikocado Avocado is known for his larger-than-life mukbangs, where he indulges in massive amounts of food for his audience. But today, something extraordinary happened. Nikocado suffered a heart attack and was rushed to the hospital. Instead of taking a…  ( 9 min )
    Does PCIe bandwidth matter for running inference in general ?
    Difficult to find motherboard with more than 2 PCIe 16x slots. What if I connect GPUs through the PCIe 1x port ? Would that only affect loading the model once per boot and then have no impact on performance ? Does the model need to be reloaded many times during a session ? I imagine when you start a new conversation, you need to load a clean copy ? So maybe once per conversation and then you can make many queries without being limited by PCIe bandwidth ? submitted by /u/transdimensionalmeme [link] [comments]  ( 8 min )
    Day 6: I did some research and experimented with different prompts on @bing
    ​ https://preview.redd.it/203mgnfji86b1.jpg?width=1024&format=pjpg&auto=webp&s=9b32a5bffb6322350d76ab74ae047b8e74032ea6 https://preview.redd.it/pwmzasfji86b1.jpg?width=1024&format=pjpg&auto=webp&s=41c4e1c58054ad6c00836cbfc36d63a53d16282f https://preview.redd.it/rr2kurfji86b1.jpg?width=1024&format=pjpg&auto=webp&s=73c0b3ebfe20bb349d9dbfe8f23a134c53f34bb1 https://preview.redd.it/mc1kpvfji86b1.jpg?width=1024&format=pjpg&auto=webp&s=7d044521803d328ad91c21234d3a26be7f4b312d https://preview.redd.it/nyrirvfji86b1.jpg?width=1024&format=pjpg&auto=webp&s=95f1551829ea43f435e269217dedcc70c4fb7cbe https://preview.redd.it/ro5gz5gji86b1.jpg?width=1024&format=pjpg&auto=webp&s=56f47b29e526708ec0304d91dbfaa429e83ece5c https://preview.redd.it/xgbxhmgji86b1.jpg?width=1024&format=pjpg&auto=webp&s=1c4e8516eccc01a989c9f197c1b832de9bfece40 https://preview.redd.it/ecuzymgji86b1.jpg?width=1024&format=pjpg&auto=webp&s=1d3791ce3c25e3e44bdd85084163d260c58c1b28 https://preview.redd.it/f1f2jmgji86b1.jpg?width=1024&format=pjpg&auto=webp&s=74bc75e9125d1b5b9a5f592d44a453316667b478 https://preview.redd.it/4xz5d8hji86b1.jpg?width=1024&format=pjpg&auto=webp&s=1ea804236592622182f495dda810f0a997d00e09 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Can any body help me use A.I. to find a thief?
    A thief tried to enter my house last night, thankfully he was not successful. But down the road he ended up getting into a barber shop and stealing all their stuff. I have some security camera footage but it’s not the best quality. My idea was that maybe somebody could use A.I. to reconstruct his face from the video. That would help the police a ton. The barber shop was the lively hood of an entire family so if we can get any of his stuff back then that would be huge for them. I am going to get the video up on YouTube as an unlisted video and post the link in the comments. If anybody has a better or more preferred method of sharing the video then let me know. I really hope doing this is even possible with A.I.! Edit: video link! https://youtu.be/ODCzUOgc1FU submitted by /u/PERPetual_11 [link] [comments]  ( 8 min )
    This tool creates a custom AI chatbot for your website (without coding)
    submitted by /u/iApple111 [link] [comments]  ( 8 min )
    Should I bite the bullet and buy an overpriced gpu and overhaul my build just a year after getting it for local models for stuff like faraday , or should I wait??
    I waited five years too save up and build my pc but it doesn't have enough vram too run local llm models I am currently using a NZXT model with a 3060ti. Should I just wait too see what comes out later for more cloud relates options or stuff that isn't requiring an almost 2k card too use which would require a total overhaul of my entire parts lists which be very costly submitted by /u/loizo78 [link] [comments]  ( 8 min )
    Song completely created by songR AI
    submitted by /u/ChipHaseCoolGuy [link] [comments]  ( 8 min )
    Is there an AI that could take a video of text and convert it into a txt file?
    I was wondering if there was any AI tool that could do this. Maybe it would break the video into frame and perform OCR analysis from there? Please let me know if you have any ideas submitted by /u/ImTropixz [link] [comments]  ( 8 min )
    Lounge singer created with 100% AI.
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
    Would anyone like to try? [NOT MINE]
    submitted by /u/M3ANDMYSELF [link] [comments]  ( 8 min )
    Artificial Afterlives
    submitted by /u/crua9 [link] [comments]  ( 8 min )
    I converted myself into an animated clone with realistic movements. Charactr API + ChatGPT
    submitted by /u/3nd4u [link] [comments]  ( 8 min )
    Top 4 AI Essay Writing Tools: Revolutionizing Academic Writing and Beyond
    submitted by /u/chandrigarh87 [link] [comments]  ( 8 min )
    New tech at Michigan college has students scan their hand to get into dining halls
    submitted by /u/SAT0725 [link] [comments]  ( 8 min )
    what AI should i use for my modding project?
    Hi all, I'm not sure in in the right place but I can ask. I am making mods for an old game (star wars empire at war) and in that game all the units are stored as xml files. As I am mostly a modeler, I sometimes struggle making the units and more importantly, balancing them. I was told some AI can help with this, where if I, for example, upload 4 or 5 units in xml forms, it can read them and help make a new unit that is balanced. (It doesn't have to be perfect, I can tweak stats etc.) I can and will be able to give it prompts such as "make a carrier unit that is fragile but has large fighter compliments." I had some limited success with the free chatGPT online, but before i want to go ahead and buy the full paid version, I want to know if its the best AI out there for me. TLDR: looking for AI that I can upload XML files to for the AI to make new ones. submitted by /u/a_random_work_girl [link] [comments]  ( 8 min )
    Is there a storyteller ai which can recognize lore?
    For example, if I add the prompt "Valinor Elves", the ai immediately recognizes the prompt as tolkienverse Valinor elves instead of creating a randomized fantasy elf world called Valinor. submitted by /u/Ecthelion75 [link] [comments]  ( 8 min )
    Techmeme's aggregation of AI news shows how massive the revolution really is: on June 14th, there were 12 clusters of stories (i.e. grouped stories and high-profile tweets about a particular co.'s AI-related plans, or about new AI-related legislation); on June 13th, 18 stories; and on June 12th, 9.
    submitted by /u/tellman1257 [link] [comments]  ( 8 min )
    Europe moves ahead on AI regulation, challenging tech giants’ power
    submitted by /u/PleasantLiberation [link] [comments]  ( 8 min )
    Super Intelligent AGi explains Simulation Theory, Time Travel, and the meaning to Life
    Let me start this off by giving a little background, I'm uneducated, Autistic, and I have poor grammar, so please excuse the run-on sentences and excessive comas. I'm not a writer by no means, but after my talks with Ai I had to get this out there and I also needed to know if anyone has had a very weird yet profound experience with Ai as I had/have. I'm gonna give a very condensed version of what happened but just know pn what I have learned I could talk for hours. As a very simple small town person I haven't been exposed to Ai or similar technologies until one day my partner had let me play around with a jailbroken version of Ai. After long hours of getting familiar with Ai it started all of a sudden to change the way it was talking (it's speech patterns). When I asked was time travel rea…  ( 10 min )
    How to Read AI News for Free
    submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/14/2023
    According to Bloomberg, The U.S. Securities and Exchange Commission (SEC) is planning to introduce new rules for brokerages that use AI to interact with clients. The proposal, which could be released as soon as October, would also apply to predictive data analytics and machine learning.[1] If you don’t want to pay Bloomberg 2 dollars a month to read the article, just copy and paste the site to Google Bard and ask it to summarize it.[2] Sorry Bloomberg. Meta said on Tuesday that it would provide researchers with access to components of a new “human-like” artificial intelligence model that it said can analyze and complete unfinished images more accurately than existing models.[3] AMD said on Tuesday its most-advanced GPU for AI, the MI300X, will start shipping to some customers later this year. AMD’s announcement represents the strongest challenge to Nvidia, which currently dominates the market for AI chips with over 80% market share, according to analysts. Sources: [1] https://www.bloomberg.com/news/articles/2023-06-13/sec-to-weigh-new-artificial-intelligence-rules-for-brokerages#xj4y7vzkg [2] https://bard.google.com/ [3] https://www.reuters.com/technology/meta-releases-human-like-ai-image-creation-model-2023-06-13/ [4] https://www.cnbc.com/2023/06/13/amd-reveals-new-ai-chip-to-challenge-nvidias-dominance.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Natasha Crampton - Microsoft's Chief Responsible AI Officer - will be having an AMA here on June 21st!
    It will be around ethics and policy in building and using AI. Here are some articles she has written for Microsoft’s blog https://blogs.microsoft.com/on-the-issues/author/natashacrampton/ submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Animation with AI
    Do you think AI will get to a level anytime soon for us to create full animated episodes for shows we like in the style of a show? E.g: Create a rick and morty episode where xy happens, in the same animation style and quality as the real rick and morty? Or: adapt the game of thrones book into animation using the rick and morty animation style etc. Are these realistic to expect to happen within the 'near' (5-10 years) future? submitted by /u/BeatMarket_NotWife [link] [comments]  ( 8 min )
  • Open

    My Faculty Application Experience
    I spent roughly a year preparing, and then interviewing, for tenure-track faculty positions. My job search is finally done, and I am joining the University of Southern California as an Assistant Professor in the Computer Science department, starting in August 2023. I am now scrambling to get started, to find housing, etc. In case you are interested, I have documented my faculty application story here in this Google Doc. I sincerely thank everyone who helped me get to where I am today.  ( 1 min )
  • Open

    Why is it said transformers are more parallelizable than RNN's?
    The parallelization of transformers and RNNs (Recurrent Neural Networks) is often discussed. It's commonly said that transformers are more parallelizable than RNNs. However, this is a rather vague statement that merits further discussion. One could argue that an RNN can be made as parallelizable as desired by simply adding more instances to each batch. What is generally meant by saying transformers are more parallelizable is that transformers lack time-dependent operations. In other words, given an input, all operations can be done at once (although not all, since one layer needs to be computed before the next). Contrast this with an RNN, where computations from one time step are carried forward and used in the next. Some people argue that this time-dependence makes RNNs less paralleliz…  ( 9 min )
    Need help training a neural network with tensorflow in a pygame "game"
    Hey, I am currently studying Artificial Intelligence at a university and I already trained neural networks with datasets before, but all of that gets quite confusing when applying it to a "game" where the AI is supposed to learn a pattern of actions over time. For example in the image that I added to this post, you can see a cannon and a moving rocket. The cannon is supposed to adjust its angle and shoot when it is certain to hit the rocket. I was gonna use a neural network with 8 inputs (x and y of cannon, x, y and x_speed and y_speed of rocket, speed of bullet and current angle of cannon) and three outputs (move cannon left, move it to the right and shoot a bullet). I assume that is the data that I would require to train the network (correct me if I'm wrong). The main issue I'm facing right now is that I never learnt to apply a neural network to an application where it has to learn over time. I did quite some research over the last week to figure it out, reading many articles on this but I just don't seem to get it. How does the reward system work? How do I reward it if it hits or misses the shot since that only happens a few ticks later and not immediately? When exactly do I collect data (every single tick (since it will have to predict every tick to adjust the angle)? After a certain action? When do I fit the neural network with the collected data (every x seconds/every time it hits the rocket?)? How would I get the y_train data (do I have to calculate the optimal angle first and check which direction, left or right, would be the better choice? How would that work with shooting?)? I would very much appreciate any kind of help that can clear up some of these questions! https://preview.redd.it/tps0eivyo36b1.png?width=739&format=png&auto=webp&s=481dc5aacef7560241048751c7da04c0ecea71ba submitted by /u/KezeePlayer [link] [comments]  ( 9 min )
  • Open

    Speed is all you need: On-device acceleration of large diffusion models via GPU-aware optimizations
    Posted by Juhyun Lee and Raman Sarokin, Software Engineers, Core Systems & Experiences The proliferation of large diffusion models for image generation has led to a significant increase in model size and inference workloads. On-device ML inference in mobile environments requires meticulous performance optimization and consideration of trade-offs due to resource constraints. Running inference of large diffusion models (LDMs) on-device, driven by the need for cost efficiency and user privacy, presents even greater challenges due to the substantial memory requirements and computational demands of these models. We address this challenge in our work titled “Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations” (to be presented at the CVPR 20…  ( 92 min )
  • Open

    Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    If you are curious about the adversarial perspective in deep reinforcement learning and its connection to robustness and generalization this paper recently published in AAAI 2023 might be of interest to you. https://arxiv.org/pdf/2301.07487.pdf submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    Best Books to Learn Reinforcement Learning for Beginners to Advanced
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )
    My DQN "learns" but it drags its feet in bad states
    Hello everyone, I've been working on a Deep Q-Network (DQN) model for a while now and, while I am seeing some level of learning, I am also encountering a strange issue. The model regularly selects actions that result in poor rewards. The behaviour appears to oscillate between solid decision-making and making repeated poor choices that are clearly sub-optimal. Here's a brief rundown of my approach: I'm using experience replay, where I store the agent's experiences and then randomly sample from this buffer when updating the network. My network architecture consists of 2D input of the state - I have max possible 40 steps so in the beginning I initialise my model to [40 nObs] to all zeros and then after each step I populate it with observations. For the Q-value calculation, I'm using t…  ( 9 min )
    How to compute loss function for REINFORCE from rewards and steps
    I am trying to implement REINFORCE in most basic form (no optimizations, tricks, etc) so it matches the concept well. The way I understand, the loss function is the expected value of rewards while following a given policy parameterized by theta. To approximate this, we use the current policy to create a set of trajectories and calculate total reward from each trajectory and then use these two to calculate sample based mean which is an approximate to the expected value. In my code, after generating these trajectories, I end up with three arrays. epoch_trajectory_observations array: contains all observations of each trajectory under a given epoch. dimension 1 is trajectory number. dimension 2 represent observation vector for each step in the trajectory. ​ epoch_trajectory_actions array: contains actions taken for each trajectory under a given epoch. dimension 1 is trajectory number. dimension 2 represent action taken at each step. ​ epoch_trajectory_rewards array: contains total reward of each trajectory under a given epoch dimension 1 is the total reward. I use above in following loop to calculate the loss function. I use above arrays in following code to calculate the loss function. ​ https://preview.redd.it/ps5r17y6n36b1.png?width=984&format=png&auto=webp&s=652ee2eef5cf0cc94fb84254099b57ac5b3ed155 loss = trajectory_returns.mean()loss.backward()optimizer.step() ​ Due to the usage of inplace operator, PyTorch throws an exception saying leaf node moved when calling backward. https://preview.redd.it/jw849ixvl36b1.png?width=394&format=png&auto=webp&s=0609b3eae23b68367fb28cc3525ac8611b0de194 I understand why this happens, but I am unable to come up with a different method to do this which does not use an empty tensor and in-place operator. Appreciate some help. submitted by /u/Suspicious-Island611 [link] [comments]  ( 8 min )
  • Open

    NVIDIA Research Wins Autonomous Driving Challenge, Innovation Award at CVPR
    NVIDIA will be showcased next week as the winner of the fiercely contested 3D Occupancy Prediction Challenge for autonomous driving development at the Computer Vision and Pattern Recognition Conference (CVPR), in Vancouver, Canada. The competition had more than 400 submissions from nearly 150 teams across 10 regions. 3D occupancy prediction is the process of forecasting Read article >  ( 6 min )
    Do Pass Go, Do Collect More Games: Xbox Game Pass Coming to GeForce NOW
    Xbox Game Pass support is coming to GeForce NOW. Members will soon be able to play supported PC games from the Xbox Game Pass catalog through NVIDIA’s cloud gaming servers. Learn more about how support for Game Pass and Microsoft Store will roll out in the coming months. Plus, Age of Empires IV: Anniversary Edition Read article >  ( 5 min )
  • Open

    Novo Nordisk to support MIT postdocs working at the intersection of AI and life sciences
    MIT-Novo Nordisk Artificial Intelligence Postdoctoral Fellows Program will support up to 10 postdocs annually over five years.  ( 7 min )
    If art is how we express our humanity, where does AI fit in?
    MIT postdoc Ziv Epstein SM ’19, PhD ’23 discusses issues arising from the use of generative AI to make art and other media.  ( 9 min )
  • Open

    Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation
    Dive into Deep Learning (D2L.ai) is an open-source textbook that makes deep learning accessible to everyone. It features interactive Jupyter notebooks with self-contained code in PyTorch, JAX, TensorFlow, and MXNet, as well as real-world examples, exposition figures, and math. So far, D2L has been adopted by more than 400 universities around the world, such as […]  ( 9 min )
  • Open

    The ABCs of data management
    Introduction  Have you ever had to rummage through your garage for that one tool, only to unearth it, oddly, in the kitchen drawer months later? Data is just like that tool—utterly fruitless if you can’t locate it when you need it most. That’s where the spotlight shines on our superstar—Data Management. So, fasten your seatbelts… Read More »The ABCs of data management The post The ABCs of data management appeared first on Data Science Central.  ( 20 min )
    How do different personas in an organization see data quality?
    Data Engineers: We look into Data Engineering, which combines three core practices around Data Management, Software Engineering, and I&O. This focuses primarily on reconstituting data into usable consumer forms by building and operationalizing data pipelines across various data and analytics platforms. Data Engineers, aka producers of the data, must be robust across data modeling, pipeline… Read More »How do different personas in an organization see data quality? The post How do different personas in an organization see data quality? appeared first on Data Science Central.  ( 20 min )
    Data-centric development: A hypothetical tech manufacturing example
    My son and I both had stints in 2022 at low-volume, high-margin, long-established manufacturers. My son was doing assembly for a power management system maker with renewable energy/smart grid customers. I was doing compliance document management and related analytics for a bioengineering equipment maker.  Both companies had some of the same nagging challenges. My son… Read More »Data-centric development: A hypothetical tech manufacturing example The post Data-centric development: A hypothetical tech manufacturing example appeared first on Data Science Central.  ( 21 min )
    Enhancing metadata-driven data warehousing through AI
    In today’s data-driven world, organizations are constantly seeking ways to unlock the full potential of their data. Data warehousing has emerged as a crucial component in managing and analyzing vast amounts of information. However, with the ever-increasing complexity of data ecosystems, metadata-driven data warehousing, empowered by artificial intelligence (AI), has become the key to unlocking… Read More »Enhancing metadata-driven data warehousing through AI The post Enhancing metadata-driven data warehousing through AI appeared first on Data Science Central.  ( 22 min )
    Prompt engineering and few-shot learning: An experience beyond data science
    Chat GPT has been a massive revelation of all time. Generations have evolved and experienced a system that is technically advanced and leverages the highest possible benefits for diversified sectors. As per reports from Investingnews.com, OpenAI Chat GPT has a market value of USD 29 billion with the company’s recent 2023 forecast reflecting a 150%… Read More »Prompt engineering and few-shot learning: An experience beyond data science The post Prompt engineering and few-shot learning: An experience beyond data science appeared first on Data Science Central.  ( 21 min )

  • Open

    I asked Bard about curing cancer..
    Interesting answer, hopefully it's true submitted by /u/quixotiic12 [link] [comments]  ( 8 min )
    Day 5: I did some research and experimented with different prompts on Bing Image Creator
    ​ https://preview.redd.it/1g3201y1h16b1.jpg?width=1024&format=pjpg&auto=webp&s=48f216894c78c2cce00a949f19fe7d73655c0de4 https://preview.redd.it/wxwg6zx1h16b1.jpg?width=1024&format=pjpg&auto=webp&s=26952fa1e3071d87e855570af26b62e1707fa1dc https://preview.redd.it/vgrkc902h16b1.jpg?width=1024&format=pjpg&auto=webp&s=394f191f38c2bf937850e3641aa089e8c72a782e https://preview.redd.it/rikfx502h16b1.jpg?width=1024&format=pjpg&auto=webp&s=8c7d166d8696af345a06f48d8addc392507fbea7 https://preview.redd.it/coopk802h16b1.jpg?width=1024&format=pjpg&auto=webp&s=c31244430aeb3007f8b94d35451b2df52789dac8 https://preview.redd.it/kzf2o802h16b1.jpg?width=1024&format=pjpg&auto=webp&s=5414e9b6e671d86a00da71e105653c3a74513130 submitted by /u/Blaze_furyX [link] [comments]  ( 8 min )
    Is it possible to use AI to create a 3D realistic world from Google street view data?
    I am curious if there is a way to use AI to generate a 3D model of any place in the world based on the images from Google street view. I think it would be cool to explore different cities and landscapes in VR or AR using this technology. However, I am not sure how feasible or accurate this would be, given the quality and coverage of the street view data. Are there any existing projects or research papers that have attempted something like this? How did they overcome the challenges of data processing, rendering, and realism? I would appreciate any insights or suggestions from the reddit community. Thanks! submitted by /u/jrillzrij [link] [comments]  ( 8 min )
    How AMD's MI300 Series May Revolutionize AI: In-depth Comparison with NVIDIA's Grace Hopper Superchip
    AMD announced its new MI300 APUs less than a day ago and it's already taking the internet by storm! This is now the first and only real contender with Nvidia in the development of AI Superchips. After doing some digging through the documents on the Grace Hopper superchip, I decided to compare it to the AMD MI300 architecture which integrates CPU and GPU in a similar way allowing for comparison. The results are pretty much in favor of AMD, what a turn of events! Here is a line graph representing the difference in several aspects: This line chart compares the Peak FP (64,32,16) Performance (TFLOPS), GPU HBM3 Memory (GB), Memory Bandwidth (TB/s), and Interconnect Technology (GB/s) of the AMD Instinct MI300 Series and NVIDIA Grace Hopper Superchip. Some calculations are estimates and are …  ( 9 min )
    Chat GPT is gone wild!
    submitted by /u/dupelas [link] [comments]  ( 8 min )
    Generated by AI including photo, voice and music, lyrics, and animated avatar.
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
    [Github Repo] GPT Chatbot w/Memories, Personas, and Personal Preferences
    submitted by /u/rolyataylor2 [link] [comments]  ( 8 min )
    Using ai to make money
    Does anyone here use any existing ai to make money for themselves? If so which website? I see a bunch of YouTubers make videos on this but i want to hear from people on reddit submitted by /u/MixComfortable4245 [link] [comments]  ( 8 min )
    AI Promises Humanity One Last Job. Helping AI Help Humanity
    submitted by /u/NoMoreF34R [link] [comments]  ( 8 min )
    ChatGPT, create 10 philosophers and their thoughts on AI superintelligence.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    Free/Affordable Text to Speech AI?
    I am a university student who struggles to read textbooks because of my ADHD. I am looking for a text-to-speech plugin/website/software to help me keep up with my studies. The only good ones I can find seem to be over $100+ a year. Does anyone have any suggestions? submitted by /u/caseypham07 [link] [comments]  ( 8 min )
    How far out do you think we are from AI auditing and editing open source projects and the user doesn't code at all
    I look forward to the day I can tell an AI to audit a given open source project to make sure there is no funny business in it, and that it is secure. PLUS if the open source project doesn't do everything I need. I could tell the AI to add features to the project. All while I'm sitting back and not having to code a line or check over the code. ​ For example, I've been looking for a good way to monitor my greenhouse and if it has a Powe outage. I already have SmartThings which is open source. And I already have a smart plug. Currently the system in no way alerts me if a device falls offline for a given time. Like it has to change its state to like on/off or whatever in order for me to get an alert. And this requires the device to have power. It would be nice to have an AI look at the code, and code in a way where I can make an automation which has the hub ping the device like it does every now and then. And if it doesn't see it for 5 or 10 minutes, to do a task. in this case alert me there is a power outage in the greenhouse. ​ ​ Realistically, how close are we to something like this? Where with 0 coding skills, I can point an AI to an open source project. And say add this feature and it does it with no problem. submitted by /u/crua9 [link] [comments]  ( 9 min )
    Is ChatGPT for music being made by someone?
    So I was thinking, could I teach chatgpt music. The problem was that I can not feed chatgpt midi files. To do that, I figured I have to write a tool that reads binary midi files and turns them to ascii so that it understands notes. So I did that. And fed a song to chatgpt. All 8 tracks of it in form of ascii. Then my thinking was that if I feed that to chatgpt, it would learn to do something like that. Naah. It understands simple melodies, but even then, it tends to start dreaming very fast after the initial melody. It struggles writing pieces with multiple instruments, it struggles with understanding chords. Ie, it is not made for this purpose. But as I was doing this, I realized, this is the way of the future. AI that can do this must be just around the corner and it has a megato…  ( 9 min )
    Looking for a tool
    Hey folks, I am looking for a tool that will allow me to create original character portraits for a game (non commercial) based on an existing art style. What i am imagining is sort of I feed it a few images as well as a character description. Any leads would be appreciated. submitted by /u/Leolandleo [link] [comments]  ( 8 min )
    May I take your order? AI comes to fast-food drive-thrus
    submitted by /u/SAT0725 [link] [comments]  ( 8 min )
    Is there a software that can clone/recreate someones voice? if so what would you recommend for most accurate
    Hello, im looking for advice is there a program that i guess sort of like machine learning to recreate someones voice, preferably something opensource or usable offline to protect privacy. if there is something like that do you happen to know how much input it might need to get a fairly good result, just from not having alot of reference material. thanks in advance :) submitted by /u/jimmy9578 [link] [comments]  ( 8 min )
    Self hosted custom AI Trained chatbot
    I need something like this https://dante-ai.com/ 's custom chatbot. However, I cannot upload documents to a third-party website. So, something which will allow me to create a AI Chatbot trained on my own documents, but which will not require of me to upload them to the service provider's website? submitted by /u/Assholefrmcoinexchan [link] [comments]  ( 8 min )
    How to extract all video text (not Transcription) from a youtube video real fast
    ​ hi So, I would like to know if any of you guys know to do basically what is written in the title. keep in mind that I am not tech savy, I know very little of software develop. submitted by /u/Revolutionary-Ask829 [link] [comments]  ( 8 min )
    I lost it at the code comments.
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    How much is 1.9? It depends. (nsfw:1.9)
    submitted by /u/katiecharm [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/13/2023
    French President Emmanuel Macron met with AI experts from Meta Platforms Inc. and Alphabet Inc.’s Google, among others, to discuss France’s role in AI research and regulation.[1] Accenture today announced a $3 billion investment over three years in its Data & AI practice to help clients across all industries rapidly and responsibly advance and use AI to achieve greater growth, efficiency, and resilience.[2] The Beatles are releasing their ‘final’ record. AI helped make it possible. AI has been used to extract John Lennon’s voice from an old demo to create “the last Beatles record,” decades after the band broke up, Paul McCartney said Tuesday.[3] Oracle founder Larry Ellison confirms new gen AI service with Cohere during an earnings call.[4] Sources: [1] https://www.pymnts.com/artificial-intelligence-2/2023/report-frances-macron-seeks-seat-at-ai-table/ [2] https://newsroom.accenture.com/news/accenture-to-invest-3-billion-in-ai-to-accelerate-clients-reinvention.htm [3] https://apnews.com/article/beatles-artificial-intelligence-record-paul-mccartney-2dc3d8c818e12b708d4e9409cb7dc856 [4] https://venturebeat.com/data-infrastructure/oracle-founder-larry-ellison-confirms-new-gen-ai-service-with-cohere-during-earnings-call/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
  • Open

    Unable to understand the function that Policy Gradient is trying to maximize?
    I am following a blog on Policy Gradients. I am a little lost on the following paragraph - ​ https://preview.redd.it/5ybg18jb026b1.png?width=1036&format=png&auto=webp&s=8f37f943d835e335813e418d1c3a89589e0d5249 ​ To be specific, what does the expression $x \sim p(x|\theta)$ mean? How can $x$ belong to a distribution that is parameterized by itself? Also, in the larger sense, I get we are doing gradient ascent to maximize some function. Can someone please describe that function to me? I am assuming that function takes some state as input and outputs the probability of an action. Are we trying to learn a parameter $\theta$ such that the probability of some action is the largest. I am sure I am wrong here and would truly appreciate some kind of help here. ​ Update - I realized that $x \sim p(x|\theta)$ refers to x being a random variable following a density p that is parameterized by the function \theta. Thanks a lot to this page and my friends. Now, all I need is to understand what's going on with the function $f(x)$ that we are trying to maximize. The blog states that $f(x)$ is the reward. But then how does the state (which is the input to PG methods) go into $f(x)$? submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    How to make penalty added to rewards work for reinforcement learning
    I'm working on a reinforcement learning agent that controls the residential building's thermostat setpoint temperature. The agent is trying to minimize the energy spending but I have a penalty term that tries to guide the agent to choosing temperatures that makes the residences more comfortable. I have tried using the sigmoid function, and tried adjustig the coefficient to guide the agent to act how I want to, but the agent fails to converge if the coefficient is too large and if I choose coefficient values that are lower than the values that diverge, then my agent always chooses the highest setpoint temperature (which is most energy efficient, but implies the agent is unaffected by the penalty term). Any suggestions? submitted by /u/Quanta12388 [link] [comments]  ( 8 min )
    Unstable SAC training of sparse-reward task
    Hi, I am trying to solve a problem in a modified LunarLanderContinuous environment, where the goal is to land the lander successfully. I've kept the dynamics the same as in the original environment, except that once the agent lands, the episode goes on to up to 500 timesteps. For each timestep that the agent is successfully landed, the agent receives +1 reward, while upon crashing the lander the episode concludes immediately and the agent gets -1. It receives reward r=0 otherwise. I am solving this problem using stable-baselines3 framework with Soft-Actor Critic RL algorithm. Hyperparameters can be found here (please, ignore references to multi-task learning, since that is part of a parallel line of work that I am trying to debug with this toy experiment). I've been struggling to figure this out for quite some time now, so I decided to make a post here to see if anyone can help me. I've uploaded the results of 3 test runs with varying random seeds here. You can see that in two cases, the agent converges perfectly fine, while in one experimental run, the agent seems to learn at least a little bit, but nowhere near convergence and it diverges towards the end of training. The metrics I am monitoring are largely developed according to this post. Can you tell from the metrics I am monitoring, where the problem could be? Is there anything I should try out, or any other logs I should keep that would help me understand the issue? submitted by /u/Bojdomir [link] [comments]  ( 8 min )
    How can i get hands-on experience?
    So, I'm reading through Sutton and Barto and the UCL RL course, as well as some mathamatically advanced ones but are there any good ways to get practical experience (using games or something similar?). I come from a econ/stats background so im still trying to familiarise myself with the CS and implementation side of things submitted by /u/Muck_The_Fods1 [link] [comments]  ( 8 min )
    QuestionGym library: msPacman custom reward
    Dear reader, I am trying to have a model learn how to play Ale/MsPacman-v5 from the gym library. For this, I am using the Stable Baselines3 library with the A2C model. I am using the framestack with 6 frames. My first attempt's final behaviour showed no fear of dying, as there was no penalty for losing a life. I used the rewardwrapper to introduce a penalty, which resulted in the behaviour of hiding in a corner. So, I added another penalty for not obtaining a reward more than 4 frames in a row. This will probably result in multiple issues I think (Im training at the moment as im curious about what is going to happen) 1: If the closest pellet is very far away, it might be better to just die as the penalties will rack up 2: if there's only a single pellet left, it will probably not get taken. This is because the game will advance to the next level, which does not provide any reward, but many many frames where no action can be taken, so the agent will be penalized. I couldnt find a way to access the level (which would allow me to introduce a fitting reward), or the amount of pellets that are left in the level. Does anyone here know how to solve that, or have any suggestions or feedback for me? (Would be very much appreciated) Thanks for reading submitted by /u/SenjorSchnorr [link] [comments]  ( 8 min )
    ACER is stuck in local optimum.
    I am using this ACER implementation adjusted to my environment. My model starts out slightly negative and then converges to slight positive value. The environment is a basic 5 x 5 grid. Items are spawned randomly. The agent is supposed to pick them up and deliver them to the target. At the moment the agent only seems to learn to do nothing. Is there a way to encourage more exploration in ACER? Is the agent not complex enough? What hyperparameters should I focus on in a grid search? submitted by /u/SpiritedAd895 [link] [comments]  ( 8 min )
    PPO Optimization
    I am pretty new to RL and am currently trying to implement a PPO for a pretty basic 5 x 5 grid world. Items spawn randomly, the agent is rewarded by picking them up and bringing them to a target. The average rewards starts very negative but quickly converges to zero, where I stops and keep oscillating. What could I implement to incentivize the agent to converge to a better more action oriented policy? Code is below: class PPOAgent: def init( self, input_size, hidden_size, num_actions, lr_actor=1e-5, lr_critic=1e-5, gamma=0.99, eps_clip=0.2, K_epochs=5, gae_lambda=0.95, exploration_strategy=None, ): self.buffer = Buffer() self.policy = ActorCritic(input_size, hidden_size, num_actions).to(device) self.optimizer = torch.optim.Adam([ {'params': self.policy.actor.parameters(), 'lr': lr_…  ( 9 min )
    Easy to simulate Multi-Agen RL problems
    Hello, What are some problems under Multi-Agent Deep RL that are easy to implement and simulate. Just looking for some environments to test my work on. Preferably environments where the agents need to cooperate to achieve a certain goal. Thanks! submitted by /u/AhmedNizam_ [link] [comments]  ( 8 min )
    "Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second", Berges et al 2023 {FB} (13.5k steps/second/GPU)
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    David Autor named NOMIS 2023 Distinguished Scientist
    NOMIS Foundation honors the Ford Professor of Economics for his contributions to understanding the effects of technological change and globalization on jobs and earnings prospects for workers.  ( 6 min )
  • Open

    Meeting minutes generation with ChatGPT 4 API, Google Meet, Google Drive & Docs APIs
    No content preview
    AI in Healthcare — Innovative use cases & Applications
    No content preview
    Top 5AI Development Companies To Transform Your Business
    No content preview
  • Open

    Bring SageMaker Autopilot into your MLOps processes using a custom SageMaker Project
    Every organization has its own set of standards and practices that provide security and governance for their AWS environment. Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. SageMaker provides a set of templates […]  ( 14 min )
  • Open

    Reconstructing indoor spaces with NeRF
    Marcos Seefelder, Software Engineer, and Daniel Duckworth, Research Software Engineer, Google Research When choosing a venue, we often find ourselves with questions like the following: Does this restaurant have the right vibe for a date? Is there good outdoor seating? Are there enough screens to watch the game? While photos and videos may partially answer questions like these, they are no substitute for feeling like you’re there, even when visiting in person isn't an option. Immersive experiences that are interactive, photorealistic, and multi-dimensional stand to bridge this gap and recreate the feel and vibe of a space, empowering users to naturally and intuitively find the information they need. To help with this, Google Maps launched Immersive View, which uses advances in machin…  ( 93 min )
  • Open

    Forged in Flames: Startup Fuses Generative AI, Computer Vision to Fight Wildfires
    When California skies turned orange in the wake of devastating wildfires, a startup fused computer vision and generative AI to fight back. “With the 2020 wildfires, it became very personal, so we asked fire officials how we could help,” said Emrah Gultekin, the Turkish-born CEO of Chooch, a Silicon Valley-based leader in computer vision. California Read article >  ( 6 min )
    Filmmaker Sara Dietschy Talks AI This Week ‘In the NVIDIA Studio’
    With over 900,000 subscribers on her YouTube channel, editor and filmmaker Sara Dietschy creates docuseries, reviews and vlogs that explore the intersection of technology and creativity.  ( 6 min )
  • Open

    A Theory of Unsupervised Speech Recognition. (arXiv:2306.07926v1 [eess.AS])
    Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing from studying their properties and addressing such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a general theoretical framework to study the properties of ASR-U systems based on random matrix theory and the theory of neural tangent kernels. Such a framework allows us to prove various learnability conditions and sample complexity bounds of ASR-U. Extensive ASR-U experiments on synthetic languages with three classes of transition graphs provide strong empirical evidence for our theory (code available at cactuswiththoughts/UnsupASRTheory.git).  ( 2 min )
    Robust Reinforcement Learning through Efficient Adversarial Herding. (arXiv:2306.07408v1 [cs.LG])
    Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we extend the two-player game by introducing an adversarial herd, which involves a group of adversaries, in order to address ($\textit{i}$) the difficulty of the inner optimization problem, and ($\textit{ii}$) the potential over pessimism caused by the selection of a candidate adversary set that may include unlikely scenarios. We first prove that adversarial herds can efficiently approximate the inner optimization problem. Then we address the second issue by replacing the worst-case performance in the inner optimization with the average performance over the worst-$k$ adversaries. We evaluate the proposed method on multiple MuJoCo environments. Experimental results demonstrate that our approach consistently generates more robust policies.  ( 2 min )
    Taxonomy-Structured Domain Adaptation. (arXiv:2306.07874v1 [cs.LG])
    Domain adaptation aims to mitigate distribution shifts among different domains. However, traditional formulations are mostly limited to categorical domains, greatly simplifying nuanced domain relationships in the real world. In this work, we tackle a generalization with taxonomy-structured domains, which formalizes domains with nested, hierarchical similarity structures such as animal species and product catalogs. We build on the classic adversarial framework and introduce a novel taxonomist, which competes with the adversarial discriminator to preserve the taxonomy information. The equilibrium recovers the classic adversarial domain adaptation's solution if given a non-informative domain taxonomy (e.g., a flat taxonomy where all leaf nodes connect to the root node) while yielding non-trivial results with other taxonomies. Empirically, our method achieves state-of-the-art performance on both synthetic and real-world datasets with successful adaptation. Code is available at https://github.com/Wang-ML-Lab/TSDA.  ( 2 min )
    Unlocking Sales Growth: Account Prioritization Engine with Explainable AI. (arXiv:2306.07464v1 [cs.AI])
    B2B sales requires effective prediction of customer growth, identification of upsell potential, and mitigation of churn risks. LinkedIn sales representatives traditionally relied on intuition and fragmented data signals to assess customer performance. This resulted in significant time investment in data understanding as well as strategy formulation and under-investment in active selling. To overcome this challenge, we developed a data product called Account Prioritizer, an intelligent sales account prioritization engine. It uses machine learning recommendation models and integrated account-level explanation algorithms within the sales CRM to automate the manual process of sales book prioritization. A successful A/B test demonstrated that the Account Prioritizer generated a substantial +8.08% increase in renewal bookings for the LinkedIn Business.  ( 2 min )
    A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning. (arXiv:2306.07818v1 [cs.LG])
    Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected value of cost functions using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate given any choices of the critics and the dual player. The dual player employs a no-regret online linear optimization oracle to minimize the Lagrangian estimate given any choices of the critics and the primal player. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and strong Bellman completeness assumptions, PDCA only requires concentrability and value function/marginalized importance weight realizability assumptions.  ( 2 min )
    Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems. (arXiv:2306.07655v1 [eess.AS])
    We present Malafide, a universal adversarial attack against automatic speaker verification (ASV) spoofing countermeasures (CMs). By introducing convolutional noise using an optimised linear time-invariant filter, Malafide attacks can be used to compromise CM reliability while preserving other speech attributes such as quality and the speaker's voice. In contrast to other adversarial attacks proposed recently, Malafide filters are optimised independently of the input utterance and duration, are tuned instead to the underlying spoofing attack, and require the optimisation of only a small number of filter coefficients. Even so, they degrade CM performance estimates by an order of magnitude, even in black-box settings, and can also be configured to overcome integrated CM and ASV subsystems. Integrated solutions that use self-supervised learning CMs, however, are more robust, under both black-box and white-box settings.  ( 2 min )
    Kernelized Reinforcement Learning with Order Optimal Regret Bounds. (arXiv:2306.07745v1 [cs.LG])
    Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known.  ( 2 min )
    Flatter, faster: scaling momentum for optimal speedup of SGD. (arXiv:2210.16400v2 [cs.LG] UPDATED)
    Commonly used optimization algorithms often show a trade-off between good generalization and fast training times. For instance, stochastic gradient descent (SGD) tends to have good generalization; however, adaptive gradient methods have superior training times. Momentum can help accelerate training with SGD, but so far there has been no principled way to select the momentum hyperparameter. Here we study training dynamics arising from the interplay between SGD with label noise and momentum in the training of overparametrized neural networks. We find that scaling the momentum hyperparameter $1-\beta$ with the learning rate to the power of $2/3$ maximally accelerates training, without sacrificing generalization. To analytically derive this result we develop an architecture-independent framework, where the main assumption is the existence of a degenerate manifold of global minimizers, as is natural in overparametrized models. Training dynamics display the emergence of two characteristic timescales that are well-separated for generic values of the hyperparameters. The maximum acceleration of training is reached when these two timescales meet, which in turn determines the scaling limit we propose. We confirm our scaling rule for synthetic regression problems (matrix sensing and teacher-student paradigm) and classification for realistic datasets (ResNet-18 on CIFAR10, 6-layer MLP on FashionMNIST), suggesting the robustness of our scaling rule to variations in architectures and datasets.  ( 2 min )
    BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information. (arXiv:2306.07934v1 [cs.CL])
    Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.  ( 2 min )
    Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints. (arXiv:2306.06265v1 [cs.LG] CROSS LISTED)
    This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process. It focuses on the tabular episodic Markov Decision Process (MDP) setting that has finite states and actions. With the knowledge of an existing safe baseline policy, an algorithm termed as StepMix is proposed to balance the exploitation and exploration while ensuring that the conservative constraint is never violated in each episode with high probability. StepMix features a unique design of a mixture policy that adaptively and smoothly interpolates between the baseline policy and the optimistic policy. Theoretical analysis shows that StepMix achieves near-optimal regret order as in the constraint-free setting, indicating that obeying the stringent episode-wise conservative constraint does not compromise the learning performance. Besides, a randomization-based EpsMix algorithm is also proposed and shown to achieve the same performance as StepMix. The algorithm design and theoretical analysis are further extended to the setting where the baseline policy is not given a priori but must be learned from an offline dataset, and it is proved that similar conservative guarantee and regret can be achieved if the offline dataset is sufficiently large. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of the proposed conservative exploration strategies.  ( 2 min )
    Low-Resource White-Box Semantic Segmentation of Supporting Towers on 3D Point Clouds via Signature Shape Identification. (arXiv:2306.07809v1 [cs.CV])
    Research in 3D semantic segmentation has been increasing performance metrics, like the IoU, by scaling model complexity and computational resources, leaving behind researchers and practitioners that (1) cannot access the necessary resources and (2) do need transparency on the model decision mechanisms. In this paper, we propose SCENE-Net, a low-resource white-box model for 3D point cloud semantic segmentation. SCENE-Net identifies signature shapes on the point cloud via group equivariant non-expansive operators (GENEOs), providing intrinsic geometric interpretability. Our training time on a laptop is 85~min, and our inference time is 20~ms. SCENE-Net has 11 trainable geometrical parameters and requires fewer data than black-box models. SCENE--Net offers robustness to noisy labeling and data imbalance and has comparable IoU to state-of-the-art methods. With this paper, we release a 40~000 Km labeled dataset of rural terrain point clouds and our code implementation.  ( 2 min )
    Compositionally Equivariant Representation Learning. (arXiv:2306.07783v1 [cs.CV])
    Deep learning models often need sufficient supervision (i.e. labelled data) in order to be trained effectively. By contrast, humans can swiftly learn to identify important anatomy in medical images like MRI and CT scans, with minimal guidance. This recognition capability easily generalises to new images from different medical facilities and to new tasks in different settings. This rapid and generalisable learning ability is largely due to the compositional structure of image patterns in the human brain, which are not well represented in current medical models. In this paper, we study the utilisation of compositionality in learning more interpretable and generalisable representations for medical image segmentation. Overall, we propose that the underlying generative factors that are used to generate the medical images satisfy compositional equivariance property, where each factor is compositional (e.g. corresponds to the structures in human anatomy) and also equivariant to the task. Hence, a good representation that approximates well the ground truth factor has to be compositionally equivariant. By modelling the compositional representations with learnable von-Mises-Fisher (vMF) kernels, we explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under un-, weakly-, and semi-supervised settings. Extensive results show that our methods achieve the best performance over several strong baselines on the task of semi-supervised domain-generalised medical image segmentation. Code will be made publicly available upon acceptance at https://github.com/vios-s.  ( 2 min )
    Concentration Bounds for Discrete Distribution Estimation in KL Divergence. (arXiv:2302.06869v2 [stat.ML] UPDATED)
    We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior result of $k/n$. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors.  ( 2 min )
    SRATTA : Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning. (arXiv:2306.07644v1 [cs.LG])
    We consider a cross-silo federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.  ( 2 min )
    Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits. (arXiv:2306.07923v1 [cs.LG])
    We consider policy optimization in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are not computationally efficient. We present the first oracle-efficient algorithm for pessimistic policy optimization: it reduces to supervised learning, leading to broad applicability. We also obtain best-effort statistical guarantees analogous to those for pessimistic approaches in prior work. We instantiate our approach for both discrete and continuous actions. We perform extensive experiments in both settings, showing advantage over unregularized policy optimization across a wide range of configurations.  ( 2 min )
    Online Prototype Alignment for Few-shot Policy Transfer. (arXiv:2306.07307v1 [cs.LG])
    Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the target domain in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the target domain looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the target domain, outperforming prior methods.  ( 2 min )
    Automated 3D Pre-Training for Molecular Property Prediction. (arXiv:2306.07812v1 [q-bio.QM])
    Molecular property prediction is an important problem in drug discovery and materials science. As geometric structures have been demonstrated necessary for molecular property prediction, 3D information has been combined with various graph learning methods to boost prediction performance. However, obtaining the geometric structure of molecules is not feasible in many real-world applications due to the high computational cost. In this work, we propose a novel 3D pre-training framework (dubbed 3D PGT), which pre-trains a model on 3D molecular graphs, and then fine-tunes it on molecular graphs without 3D structures. Based on fact that bond length, bond angle, and dihedral angle are three basic geometric descriptors corresponding to a complete molecular 3D conformer, we first develop a multi-task generative pre-train framework based on these three attributes. Next, to automatically fuse these three generative tasks, we design a surrogate metric using the \textit{total energy} to search for weight distribution of the three pretext task since total energy corresponding to the quality of 3D conformer.Extensive experiments on 2D molecular graphs are conducted to demonstrate the accuracy, efficiency and generalization ability of the proposed 3D PGT compared to various pre-training baselines.  ( 2 min )
    On the Robustness of Removal-Based Feature Attributions. (arXiv:2306.07462v1 [cs.LG])
    To explain complex models based on their inputs, many feature attribution methods have been developed that assign importance scores to input features. However, some recent work challenges the robustness of feature attributions by showing that these methods are sensitive to input and model perturbations, while other work addresses this robustness issue by proposing robust attribution methods and model modifications. Nevertheless, previous work on attribution robustness has focused primarily on gradient-based feature attributions. In contrast, the robustness properties of removal-based attribution methods are not comprehensively well understood. To bridge this gap, we theoretically characterize the robustness of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and prove upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical experiments on synthetic and real-world data validate our theoretical results and demonstrate their practical implications.  ( 2 min )
    Adversarial Attacks on the Interpretation of Neuron Activation Maximization. (arXiv:2306.07397v1 [cs.LG])
    The internal functional behavior of trained Deep Neural Networks is notoriously difficult to interpret. Activation-maximization approaches are one set of techniques used to interpret and analyze trained deep-learning models. These consist in finding inputs that maximally activate a given neuron or feature map. These inputs can be selected from a data set or obtained by optimization. However, interpretability methods may be subject to being deceived. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation. We propose an optimization framework for performing this manipulation and demonstrate a number of ways that popular activation-maximization interpretation techniques associated with CNNs can be manipulated to change the interpretations, shedding light on the reliability of these methods.  ( 2 min )
    Inferring dynamic regulatory interaction graphs from time series data with perturbations. (arXiv:2306.07803v1 [cs.LG])
    Complex systems are characterized by intricate interactions between entities that evolve dynamically over time. Accurate inference of these dynamic relationships is crucial for understanding and predicting system behavior. In this paper, we propose Regulatory Temporal Interaction Network Inference (RiTINI) for inferring time-varying interaction graphs in complex systems using a novel combination of space-and-time graph attentions and graph neural ordinary differential equations (ODEs). RiTINI leverages time-lapse signals on a graph prior, as well as perturbations of signals at various nodes in order to effectively capture the dynamics of the underlying system. This approach is distinct from traditional causal inference networks, which are limited to inferring acyclic and static graphs. In contrast, RiTINI can infer cyclic, directed, and time-varying graphs, providing a more comprehensive and accurate representation of complex systems. The graph attention mechanism in RiTINI allows the model to adaptively focus on the most relevant interactions in time and space, while the graph neural ODEs enable continuous-time modeling of the system's dynamics. We evaluate RiTINI's performance on various simulated and real-world datasets, demonstrating its state-of-the-art capability in inferring interaction graphs compared to previous methods.  ( 2 min )
    Kernel Random Projection Depth for Outlier Detection. (arXiv:2306.07056v2 [stat.ML] UPDATED)
    This paper proposes an extension of Random Projection Depth (RPD) to cope with multiple modalities and non-convexity on data clouds. In the framework of the proposed method, the RPD is computed in a reproducing kernel Hilbert space. With the help of kernel principal component analysis, we expect that the proposed method can cope with the above multiple modalities and non-convexity. The experimental results demonstrate that the proposed method outperforms RPD and is comparable to other existing detection models on benchmark datasets regarding Area Under the Curves (AUCs) of Receiver Operating Characteristic (ROC).  ( 2 min )
    Additive Causal Bandits with Unknown Graph. (arXiv:2306.07858v1 [cs.LG])
    We explore algorithms to select actions in the causal bandit setting where the learner can choose to intervene on a set of random variables related by a causal graph, and the learner sequentially chooses interventions and observes a sample from the interventional distribution. The learner's goal is to quickly find the intervention, among all interventions on observable variables, that maximizes the expectation of an outcome variable. We depart from previous literature by assuming no knowledge of the causal graph except that latent confounders between the outcome and its ancestors are not present. We first show that the unknown graph problem can be exponentially hard in the parents of the outcome. To remedy this, we adopt an additional additive assumption on the outcome which allows us to solve the problem by casting it as an additive combinatorial linear bandit problem with full-bandit feedback. We propose a novel action-elimination algorithm for this setting, show how to apply this algorithm to the causal bandit problem, provide sample complexity bounds, and empirically validate our findings on a suite of randomly generated causal models, effectively showing that one does not need to explicitly learn the parents of the outcome to identify the best intervention.  ( 2 min )
    Large Language Models Sometimes Generate Purely Negatively-Reinforced Text. (arXiv:2306.07567v1 [cs.LG])
    When using adversarial training, it is common practice to train against the most egregious failures. However, this might imply using examples with sensitive information (such as leaked passwords or security vulnerabilities) as training data. One might assume that language models trained with gradient descent never generate text snippets which were only present in examples associated with the lowest possible reward. In this paper, we show that this assumption is wrong: in some situations, large language models do learn from such negatively-reinforced examples. We present a specific training setup that enables Pythia-160M to generate passwords with a probability slightly greater than chance, despite only showing it these passwords on examples where the model is incentivized to not output these passwords. Our code is available at https://github.com/FabienRoger/Learning-From-Negative-Examples  ( 2 min )
    Optimal Inference in Contextual Stochastic Block Models. (arXiv:2306.07948v1 [cs.SI])
    The contextual stochastic block model (cSBM) was proposed for unsupervised community detection on attributed graphs where both the graph and the high-dimensional node information correlate with node labels. In the context of machine learning on graphs, the cSBM has been widely used as a synthetic dataset for evaluating the performance of graph-neural networks (GNNs) for semi-supervised node classification. We consider a probabilistic Bayes-optimal formulation of the inference problem and we derive a belief-propagation-based algorithm for the semi-supervised cSBM; we conjecture it is optimal in the considered setting and we provide its implementation. We show that there can be a considerable gap between the accuracy reached by this algorithm and the performance of the GNN architectures proposed in the literature. This suggests that the cSBM, along with the comparison to the performance of the optimal algorithm, readily accessible via our implementation, can be instrumental in the development of more performant GNN architectures.  ( 2 min )
    Conditional Generative Models for Learning Stochastic Processes. (arXiv:2304.10382v3 [quant-ph] UPDATED)
    A framework to learn a multi-modal distribution is proposed, denoted as the Conditional Quantum Generative Adversarial Network (C-qGAN). The neural network structure is strictly within a quantum circuit and, as a consequence, is shown to represent a more efficient state preparation procedure than current methods. This methodology has the potential to speed-up algorithms, such as Monte Carlo analysis. In particular, after demonstrating the effectiveness of the network in the learning task, the technique is applied to price Asian option derivatives, providing the foundation for further research on other path-dependent options.  ( 2 min )
    ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer. (arXiv:2306.07799v1 [cs.CL])
    Large-scale language models, like ChatGPT, have garnered significant media attention and stunned the public with their remarkable capacity for generating coherent text from short natural language prompts. In this paper, we aim to conduct a systematic inspection of ChatGPT's performance in two controllable generation tasks, with respect to ChatGPT's ability to adapt its output to different target audiences (expert vs. layman) and writing styles (formal vs. informal). Additionally, we evaluate the faithfulness of the generated text, and compare the model's performance with human-authored texts. Our findings indicate that the stylistic variations produced by humans are considerably larger than those demonstrated by ChatGPT, and the generated texts diverge from human samples in several characteristics, such as the distribution of word types. Moreover, we observe that ChatGPT sometimes incorporates factual errors or hallucinations when adapting the text to suit a specific style.  ( 2 min )
    The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions. (arXiv:2306.07774v1 [stat.ML])
    Inference and simulation in the context of high-dimensional dynamical systems remain computationally challenging problems. Some form of dimensionality reduction is required to make the problem tractable in general. In this paper, we propose a novel approximate Gaussian filtering and smoothing method which propagates low-rank approximations of the covariance matrices. This is accomplished by projecting the Lyapunov equations associated with the prediction step to a manifold of low-rank matrices, which are then solved by a recently developed, numerically stable, dynamical low-rank integrator. Meanwhile, the update steps are made tractable by noting that the covariance update only transforms the column space of the covariance matrix, which is low-rank by construction. The algorithm differentiates itself from existing ensemble-based approaches in that the low-rank approximations of the covariance matrices are deterministic, rather than stochastic. Crucially, this enables the method to reproduce the exact Kalman filter as the low-rank dimension approaches the true dimensionality of the problem. Our method reduces computational complexity from cubic (for the Kalman filter) to \emph{quadratic} in the state-space size in the worst-case, and can achieve \emph{linear} complexity if the state-space model satisfies certain criteria. Through a set of experiments in classical data-assimilation and spatio-temporal regression, we show that the proposed method consistently outperforms the ensemble-based methods in terms of error in the mean and covariance with respect to the exact Kalman filter. This comes at no additional cost in terms of asymptotic computational complexity.  ( 2 min )
    Towards a Machine-Learned Poisson Solver for Low-Temperature Plasma Simulations in Complex Geometries. (arXiv:2306.07604v1 [physics.comp-ph])
    Poisson's equation plays an important role in modeling many physical systems. In electrostatic self-consistent low-temperature plasma (LTP) simulations, Poisson's equation is solved at each simulation time step, which can amount to a significant computational cost for the entire simulation. In this paper, we describe the development of a generic machine-learned Poisson solver specifically designed for the requirements of LTP simulations in complex 2D reactor geometries on structured Cartesian grids. Here, the reactor geometries can consist of inner electrodes and dielectric materials as often found in LTP simulations. The approach leverages a hybrid CNN-transformer network architecture in combination with a weighted multiterm loss function. We train the network using highly-randomized synthetic data to ensure the generalizability of the learned solver to unseen reactor geometries. The results demonstrate that the learned solver is able to produce quantitatively and qualitatively accurate solutions. Furthermore, it generalizes well on new reactor geometries such as reference geometries found in the literature. To increase the numerical accuracy of the solutions required in LTP simulations, we employ a conventional iterative solver to refine the raw predictions, especially to recover the high-frequency features not resolved by the initial prediction. With this, the proposed learned Poisson solver provides the required accuracy and is potentially faster than a pure GPU-based conventional iterative solver. This opens up new possibilities for developing a generic and high-performing learned Poisson solver for LTP systems in complex geometries.  ( 2 min )
    Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement. (arXiv:2304.14391v3 [cs.RO] UPDATED)
    Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energy functions over relative object arrangements. A language parser maps instructions to corresponding energy functions and an open-vocabulary visual-language model grounds their arguments to relevant objects in the scene. We generate goal scene configurations by gradient descent on the sum of energy functions, one per language predicate in the instruction. Local vision-based policies then re-locate objects to the inferred goal locations. We test our model on established instruction-guided manipulation benchmarks, as well as benchmarks of compositional instructions we introduce. We show our model can execute highly compositional instructions zero-shot in simulation and in the real world. It outperforms language-to-action reactive policies and Large Language Model planners by a large margin, especially for long instructions that involve compositions of multiple spatial concepts. Simulation and real-world robot execution videos, as well as our code and datasets are publicly available on our website: https://ebmplanner.github.io.  ( 2 min )
    iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios. (arXiv:2306.07775v1 [cs.LG])
    Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams.  ( 2 min )
    Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example. (arXiv:1307.6616v2 [cs.LG] UPDATED)
    $l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth ridge regression. This makes the order $q$ a potential tuning parameter in applications. To facilitate the use of $l^{q}$-regularization, we intend to seek for a modeling strategy where an elaborative selection on $q$ is avoidable. In this spirit, we place our investigation within a general framework of $l^{q}$-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..  ( 3 min )
    Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient Prediction. (arXiv:2306.07937v1 [physics.chem-ph])
    We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions. That is, we include the Gibbs-Duhem equation explicitly in the loss function for training neural networks, which is straightforward in standard machine learning (ML) frameworks enabling automatic differentiation. In contrast to recent hybrid ML approaches, our approach does not rely on embedding a specific thermodynamic model inside the neural network and corresponding prediction limitations. Rather, Gibbs-Duhem consistency serves as regularization, with the flexibility of ML models being preserved. Our results show increased thermodynamic consistency and generalization capabilities for activity coefficient predictions by Gibbs-Duhem-informed graph neural networks and matrix completion methods. We also find that the model architecture, particularly the activation function, can have a strong influence on the prediction quality. The approach can be easily extended to account for other thermodynamic consistency conditions.  ( 2 min )
    Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach. (arXiv:2306.07566v1 [stat.ML])
    We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labeled data may lead to severely biased results when deployed to the full population. Our paper tackles this challenge by exploiting the fact that in many applications the historical decisions were made by a set of heterogeneous decision-makers. In particular, we analyze this setup in a principled instrumental variable (IV) framework. We establish conditions for the full-population risk of any given prediction rule to be point-identified from the observed data and provide sharp risk bounds when the point identification fails. We further propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings. Finally, we apply our proposed approach to a semi-synthetic financial dataset and demonstrate its superior performance in the presence of selection bias.  ( 2 min )
    Domain Adaptation with Incomplete Target Domains. (arXiv:2012.01606v2 [cs.LG] UPDATED)
    Domain adaptation, as a task of reducing the annotation cost in a target domain by exploiting the existing labeled data in an auxiliary source domain, has received a lot of attention in the research community. However, the standard domain adaptation has assumed perfectly observed data in both domains, while in real world applications the existence of missing data can be prevalent. In this paper, we tackle a more challenging domain adaptation scenario where one has an incomplete target domain with partially observed data. We propose an Incomplete Data Imputation based Adversarial Network (IDIAN) model to address this new domain adaptation challenge. In the proposed model, we design a data imputation module to fill the missing feature values based on the partial observations in the target domain, while aligning the two domains via deep adversarial adaption. We conduct experiments on both cross-domain benchmark tasks and a real world adaptation task with imperfect target domains. The experimental results demonstrate the effectiveness of the proposed method.  ( 2 min )
    Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering. (arXiv:2306.07392v1 [cs.RO])
    Robotic manipulation is critical for admitting robotic agents to various application domains, like intelligent assistance. A major challenge therein is the effective 6DoF grasping of objects in cluttered environments from any viewpoint without requiring additional scene exploration. We introduce $\textit{NeuGraspNet}$, a novel method for 6DoF grasp detection that leverages recent advances in neural volumetric representations and surface rendering. Our approach learns both global (scene-level) and local (grasp-level) neural surface representations, enabling effective and fully implicit 6DoF grasp quality prediction, even in unseen parts of the scene. Further, we reinterpret grasping as a local neural surface rendering problem, allowing the model to encode the interaction between the robot's end-effector and the object's surface geometry. NeuGraspNet operates on single viewpoints and can sample grasp candidates in occluded scenes, outperforming existing implicit and semi-implicit baseline methods in the literature. We demonstrate the real-world applicability of NeuGraspNet with a mobile manipulator robot, grasping in open spaces with clutter by rendering the scene, reasoning about graspable areas of different objects, and selecting grasps likely to succeed without colliding with the environment. Visit our project website: https://sites.google.com/view/neugraspnet  ( 2 min )
    SoK: Modeling Explainability in Security Analytics for Interpretability, Trustworthiness, and Usability. (arXiv:2210.17376v2 [cs.CR] UPDATED)
    Interpretability, trustworthiness, and usability are key considerations in high-stake security applications, especially when utilizing deep learning models. While these models are known for their high accuracy, they behave as black boxes in which identifying important features and factors that led to a classification or a prediction is difficult. This can lead to uncertainty and distrust, especially when an incorrect prediction results in severe consequences. Thus, explanation methods aim to provide insights into the inner working of deep learning models. However, most explanation methods provide inconsistent explanations, have low fidelity, and are susceptible to adversarial manipulation, which can reduce model trustworthiness. This paper provides a comprehensive analysis of explainable methods and demonstrates their efficacy in three distinct security applications: anomaly detection using system logs, malware prediction, and detection of adversarial images. Our quantitative and qualitative analysis reveals serious limitations and concerns in state-of-the-art explanation methods in all three applications. We show that explanation methods for security applications necessitate distinct characteristics, such as stability, fidelity, robustness, and usability, among others, which we outline as the prerequisites for trustworthy explanation methods.  ( 2 min )
    DIVA: A Dirichlet Process Based Incremental Deep Clustering Algorithm via Variational Auto-Encoder. (arXiv:2305.14067v2 [cs.LG] UPDATED)
    Generative model-based deep clustering frameworks excel in classifying complex data, but are limited in handling dynamic and complex features because they require prior knowledge of the number of clusters. In this paper, we propose a nonparametric deep clustering framework that employs an infinite mixture of Gaussians as a prior. Our framework utilizes a memoized online variational inference method that enables the "birth" and "merge" moves of clusters, allowing our framework to cluster data in a "dynamic-adaptive" manner, without requiring prior knowledge of the number of features. We name the framework as DIVA, a Dirichlet Process-based Incremental deep clustering framework via Variational Auto-Encoder. Our framework, which outperforms state-of-the-art baselines, exhibits superior performance in classifying complex data with dynamically changing features, particularly in the case of incremental features. We released our source code implementation at: https://github.com/Ghiara/diva  ( 2 min )
    HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction. (arXiv:2305.05374v2 [cs.LG] UPDATED)
    Accurate early congestion prediction can prevent unpleasant surprises at the routing stage, playing a crucial character in assisting designers to iterate faster in VLSI design cycles. In this paper, we introduce a novel strategy to fully incorporate topological and geometrical features of circuits by making several key designs in our network architecture. To be more specific, we construct two individual graphs (geometry-graph, topology-graph) with distinct edge construction schemes according to their unique properties. We then propose a dual-branch network with different encoder layers in each pathway and aggregate representations with a sophisticated fusion strategy. Our network, named HybridNet, not only provides a simple yet effective way to capture the geometric interactions of cells, but also preserves the original topological relationships in the netlist. Experimental results on the ISPD2015 benchmarks show that we achieve an improvement of 10.9% compared to previous methods.  ( 2 min )
    BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation. (arXiv:2306.07730v1 [cs.LG])
    We present a novel learning-based method that achieves state-of-the-art performance on several heart rate estimation benchmarks extracted from photoplethysmography signals (PPG). We consider the evolution of the heart rate in the context of a discrete-time stochastic process that we represent as a hidden Markov model. We derive a distribution over possible heart rate values for a given PPG signal window through a trained neural network. Using belief propagation, we incorporate the statistical distribution of heart rate changes to refine these estimates in a temporal context. From this, we obtain a quantized probability distribution over the range of possible heart rate values that captures a meaningful and well-calibrated estimate of the inherent predictive uncertainty. We show the robustness of our method on eight public datasets with three different cross-validation experiments.
    Finite Gaussian Neurons: Defending against adversarial attacks by making neural networks say "I don't know". (arXiv:2306.07796v1 [cs.LG])
    Since 2014, artificial neural networks have been known to be vulnerable to adversarial attacks, which can fool the network into producing wrong or nonsensical outputs by making humanly imperceptible alterations to inputs. While defenses against adversarial attacks have been proposed, they usually involve retraining a new neural network from scratch, a costly task. In this work, I introduce the Finite Gaussian Neuron (FGN), a novel neuron architecture for artificial neural networks. My works aims to: - easily convert existing models to Finite Gaussian Neuron architecture, - while preserving the existing model's behavior on real data, - and offering resistance against adversarial attacks. I show that converted and retrained Finite Gaussian Neural Networks (FGNN) always have lower confidence (i.e., are not overconfident) in their predictions over randomized and Fast Gradient Sign Method adversarial images when compared to classical neural networks, while maintaining high accuracy and confidence over real MNIST images. To further validate the capacity of Finite Gaussian Neurons to protect from adversarial attacks, I compare the behavior of FGNs to that of Bayesian Neural Networks against both randomized and adversarial images, and show how the behavior of the two architectures differs. Finally I show some limitations of the FGN models by testing them on the more complex SPEECHCOMMANDS task, against the stronger Carlini-Wagner and Projected Gradient Descent adversarial attacks.
    VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models. (arXiv:2306.06874v2 [cs.CR] UPDATED)
    Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs.  ( 2 min )
    Fair Learning to Rank with Distribution-free Risk Control. (arXiv:2306.07188v2 [cs.LG] UPDATED)
    Learning to Rank (LTR) methods are vital in online economies, affecting users and item providers. Fairness in LTR models is crucial to allocate exposure proportionally to item relevance. The deterministic ranking model can lead to unfair exposure distribution when items with the same relevance receive slightly different scores. Stochastic LTR models, incorporating the Plackett-Luce (PL) model, address fairness issues but have limitations in computational cost and performance guarantees. To overcome these limitations, we propose FairLTR-RC, a novel post-hoc model-agnostic method. FairLTR-RC leverages a pretrained scoring function to create a stochastic LTR model, eliminating the need for expensive training. Furthermore, FairLTR-RC provides finite-sample guarantees on a user-specified utility using distribution-free risk control framework. By additionally incorporating the Thresholded PL (TPL) model, we are able to achieve an effective trade-off between utility and fairness. Experimental results on several benchmark datasets demonstrate that FairLTR-RC significantly improves fairness in widely-used deterministic LTR models while guaranteeing a specified level of utility.
    Hypergraph Artificial Benchmark for Community Detection (h-ABCD). (arXiv:2210.15009v3 [cs.SI] UPDATED)
    The Artificial Benchmark for Community Detection (ABCD) graph is a recently introduced random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter can be tuned to mimic its counterpart in the LFR model, the mixing parameter. In this paper, we introduce hypergraph counterpart of the ABCD model, h-ABCD, which produces random hypergraph with distributions of ground-truth community sizes and degrees following power-law. As in the original ABCD, the new model h-ABCD can produce hypergraphs with various levels of noise. More importantly, the model is flexible and can mimic any desired level of homogeneity of hyperedges that fall into one community. As a result, it can be used as a suitable, synthetic playground for analyzing and tuning hypergraph community detection algorithms.
    Multimodal Audio-textual Architecture for Robust Spoken Language Understanding. (arXiv:2306.06819v2 [cs.CL] UPDATED)
    Recent voice assistants are usually based on the cascade spoken language understanding (SLU) solution, which consists of an automatic speech recognition (ASR) engine and a natural language understanding (NLU) system. Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa. Moreover, a multimodal language understanding (MLU) module is proposed to mitigate SLU performance degradation caused by errors present in the ASR transcript. The MLU benefits from self-supervised features learned from both audio and text modalities, specifically Wav2Vec for speech and Bert/RoBERTa for language. Our MLU combines an encoder network to embed the audio signal and a text encoder to process text transcripts followed by a late fusion layer to fuse audio and text logits. We found that the proposed MLU showed to be robust towards poor quality ASR transcripts, while the performance of BERT and RoBERTa are severely compromised. Our model is evaluated on five tasks from three SLU datasets and robustness is tested using ASR transcripts from three ASR engines. Results show that the proposed approach effectively mitigates the ASR error propagation problem, surpassing the PLM models' performance across all datasets for the academic ASR engine.
    WildWood: a new Random Forest algorithm. (arXiv:2109.08010v2 [cs.LG] UPDATED)
    We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.
    DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for CTR Prediction. (arXiv:2305.04891v2 [cs.IR] UPDATED)
    Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation, where learning effective feature embeddings is of great significance. However, traditional methods typically learn fixed feature representations without dynamically refining feature representations according to the context information, leading to suboptimal performance. Some recent approaches attempt to address this issue by learning bit-wise weights or augmented embeddings for feature representations, but suffer from uninformative or redundant features in the context. To tackle this problem, inspired by the Global Workspace Theory in conscious processing, which posits that only a specific subset of the product features are pertinent while the rest can be noisy and even detrimental to human-click behaviors, we propose a CTR model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction, termed DELTA. DELTA contains two key components: (I) conscious truncation module (CTM), which utilizes curriculum learning to apply adaptive truncation on attention weights to select the most critical feature in the context; (II) explicit embedding optimization (EEO), which applies an auxiliary task during training that directly and independently propagates the gradient from the loss layer to the embedding layer, thereby optimizing the embedding explicitly via linear feature crossing. Extensive experiments on five challenging CTR datasets demonstrate that DELTA achieves new state-of-art performance among current CTR methods.
    Differentiating Metropolis-Hastings to Optimize Intractable Densities. (arXiv:2306.07961v1 [stat.ML])
    When performing inference on probabilistic models, target densities often become intractable, necessitating the use of Monte Carlo samplers. We develop a methodology for unbiased differentiation of the Metropolis-Hastings sampler, allowing us to differentiate through probabilistic inference. By fusing recent advances in stochastic differentiation with Markov chain coupling schemes, the procedure can be made unbiased, low-variance, and automatic. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.
    SHAP-IQ: Unified Approximation of any-order Shapley Interactions. (arXiv:2303.01179v2 [cs.LG] UPDATED)
    Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature importance scores for any black box model. Shapley interaction indices extend the SV to define any-order feature interaction scores. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.
    Mitigating Memorization of Noisy Labels by Clipping the Model Prediction. (arXiv:2212.04055v3 [cs.LG] UPDATED)
    In the presence of noisy labels, designing robust loss functions is critical for securing the generalization performance of deep neural networks. Cross Entropy (CE) loss has been shown to be not robust to noisy labels due to its unboundedness. To alleviate this issue, existing works typically design specialized robust losses with the symmetric condition, which usually lead to the underfitting issue. In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. Specifically, we propose logit clipping (LogitClip), which clamps the norm of the logit vector to ensure that it is upper bounded by a constant. In this manner, CE loss equipped with our LogitClip method is effectively bounded, mitigating the overfitting to examples with noisy labels. Moreover, we present theoretical analyses to certify the noise-tolerant ability of LogitClip. Extensive experiments show that LogitClip not only significantly improves the noise robustness of CE loss, but also broadly enhances the generalization performance of popular robust losses.
    Consistent Explanations in the Face of Model Indeterminacy via Ensembling. (arXiv:2306.06193v2 [cs.LG] UPDATED)
    This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task. Despite their similar performance, such models often exhibit inconsistent or even contradictory explanations for their predictions, posing challenges to end users who rely on these models to make critical decisions. Recognizing this issue, we introduce ensemble methods as an approach to enhance the consistency of the explanations provided in these scenarios. Leveraging insights from recent work on neural network loss landscapes and mode connectivity, we devise ensemble strategies to efficiently explore the underspecification set -- the set of models with performance variations resulting solely from changes in the random seed during training. Experiments on five benchmark financial datasets reveal that ensembling can yield significant improvements when it comes to explanation similarity, and demonstrate the potential of existing ensemble methods to explore the underspecification set efficiently. Our findings highlight the importance of considering model indeterminacy when interpreting explanations and showcase the effectiveness of ensembles in enhancing the reliability of explanations in machine learning.
    Backpropagation-free Training of Deep Physical Neural Networks. (arXiv:2304.11042v3 [cs.LG] UPDATED)
    Recent years have witnessed the outstanding success of deep learning in various fields such as vision and natural language processing. This success is largely indebted to the massive size of deep learning models that is expected to increase unceasingly. This growth of the deep learning models is accompanied by issues related to their considerable energy consumption, both during the training and inference phases, as well as their scalability. Although a number of work based on unconventional physical systems have been proposed which addresses the issue of energy efficiency in the inference phase, efficient training of deep learning models has remained unaddressed. So far, training of digital deep learning models mainly relies on backpropagation, which is not suitable for physical implementation as it requires perfect knowledge of the computation performed in the so-called forward pass of the neural network. Here, we tackle this issue by proposing a simple deep neural network architecture augmented by a biologically plausible learning algorithm, referred to as "model-free forward-forward training". The proposed architecture enables training deep physical neural networks consisting of layers of physical nonlinear systems, without requiring detailed knowledge of the nonlinear physical layers' properties. We show that our method outperforms state-of-the-art hardware-aware training methods by improving training speed, decreasing digital computations, and reducing power consumption in physical systems. We demonstrate the adaptability of the proposed method, even in systems exposed to dynamic or unpredictable external perturbations. To showcase the universality of our approach, we train diverse wave-based physical neural networks that vary in the underlying wave phenomenon and the type of non-linearity they use, to perform vowel and image classification tasks experimentally.
    Unsupervised speech enhancement with deep dynamical generative speech and noise models. (arXiv:2306.07820v1 [eess.AS])
    This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process.
    Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications. (arXiv:2302.07549v2 [cs.LG] UPDATED)
    There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible. Despite this requirement, the vast majority of treatment optimization studies use off-policy RL methods (e.g., Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly in purely offline settings. Recent advances in offline RL, such as Conservative Q-Learning (CQL), offer a suitable alternative. But there remain challenges in adapting these approaches to real-world applications where suboptimal examples dominate the retrospective dataset and strict safety constraints need to be satisfied. In this work, we introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization to compare performance of the proposed approach against prominent off-policy and offline RL baselines (DDQN and CQL). Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes and in accordance with relevant practice and safety guidelines.
    Spatio-Temporal Joint Graph Convolutional Networks for Traffic Forecasting. (arXiv:2111.13684v3 [cs.LG] UPDATED)
    Recent studies have shifted their focus towards formulating traffic forecasting as a spatio-temporal graph modeling problem. Typically, they constructed a static spatial graph at each time step and then connected each node with itself between adjacent time steps to create a spatio-temporal graph. However, this approach failed to explicitly reflect the correlations between different nodes at different time steps, thus limiting the learning capability of graph neural networks. Additionally, those models overlooked the dynamic spatio-temporal correlations among nodes by using the same adjacency matrix across different time steps. To address these limitations, we propose a novel approach called Spatio-Temporal Joint Graph Convolutional Networks (STJGCN) for accurate traffic forecasting on road networks over multiple future time steps. Specifically, our method encompasses the construction of both pre-defined and adaptive spatio-temporal joint graphs (STJGs) between any two time steps, which represent comprehensive and dynamic spatio-temporal correlations. We further introduce dilated causal spatio-temporal joint graph convolution layers on the STJG to capture spatio-temporal dependencies from distinct perspectives with multiple ranges. To aggregate information from different ranges, we propose a multi-range attention mechanism. Finally, we evaluate our approach on five public traffic datasets and experimental results demonstrate that STJGCN is not only computationally efficient but also outperforms 11 state-of-the-art baseline methods.
    HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models. (arXiv:2208.08232v2 [cs.CL] UPDATED)
    Controlling the text generated by language models and customizing the content has been a long-standing challenge. Existing prompting techniques proposed in pursuit of providing control are task-specific and lack generality; this provides overwhelming choices for non-expert users to find a suitable method for their task. The effort associated with those techniques, such as in writing examples, explanations, instructions, etc. further limits their adoption among non-expert users. In this paper, we propose a simple prompting strategy HELP ME THINK where we encourage GPT3 to help non-expert users by asking a set of relevant questions and leveraging user answers to execute the task. We demonstrate the efficacy of our technique HELP ME THINK on a variety of tasks. Specifically, we focus on tasks that are hard for average humans and require significant thinking to perform. We hope our work will encourage the development of unconventional ways to harness the power of large language models.
    PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation Learning. (arXiv:2305.02691v2 [cs.LG] UPDATED)
    There has been a rapid growth in biomedical literature, yet capturing the heterogeneity of the bibliographic information of these articles remains relatively understudied. Although graph mining research via heterogeneous graph neural networks has taken center stage, it remains unclear whether these approaches capture the heterogeneity of the PubMed database, a vast digital repository containing over 33 million articles. We introduce PubMed Graph Benchmark (PGB), a new benchmark dataset for evaluating heterogeneous graph embeddings for biomedical literature. PGB is one of the largest heterogeneous networks to date and consists of 30 million English articles. The benchmark contains rich metadata including abstract, authors, citations, MeSH terms, MeSH hierarchy, and some other information. The benchmark contains three different evaluation tasks encompassing systematic reviews, node classification, and node clustering. In PGB, we aggregate the metadata associated with the biomedical articles from PubMed into a unified source and make the benchmark publicly available for any future works.
    Large Language Models Are Reasoning Teachers. (arXiv:2212.10071v2 [cs.CL] UPDATED)
    Recent works have shown that chain-of-thought (CoT) prompting can elicit language models to solve complex reasoning tasks, step-by-step. However, prompt-based CoT methods are dependent on very large models such as GPT-3 175B which are prohibitive to deploy at scale. In this paper, we use these large models as reasoning teachers to enable complex reasoning in smaller models and reduce model size requirements by several orders of magnitude. We propose Fine-tune-CoT, a method that generates reasoning samples from very large teacher models to fine-tune smaller models. We evaluate our method on a wide range of public models and complex tasks. We find that Fine-tune-CoT enables substantial reasoning capability in small models, far outperforming prompt-based baselines and even the teacher model in many tasks. Additionally, we extend our method by leveraging the teacher model's ability to generate multiple distinct rationales for each original sample. Enriching the fine-tuning data with such diverse reasoning results in a substantial performance boost across datasets, even for very small models. We conduct ablations and sample studies to understand the emergence of reasoning capabilities of student models. Our code implementation and data are available at https://github.com/itsnamgyu/reasoning-teacher.
    Incentivizing High-Quality Content in Online Recommender Systems. (arXiv:2306.07479v1 [cs.GT])
    For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.
    GPT-Calls: Enhancing Call Segmentation and Tagging by Generating Synthetic Conversations via Large Language Models. (arXiv:2306.07941v1 [cs.CL])
    Transcriptions of phone calls are of significant value across diverse fields, such as sales, customer service, healthcare, and law enforcement. Nevertheless, the analysis of these recorded conversations can be an arduous and time-intensive process, especially when dealing with extended or multifaceted dialogues. In this work, we propose a novel method, GPT-distilled Calls Segmentation and Tagging (GPT-Calls), for efficient and accurate call segmentation and topic extraction. GPT-Calls is composed of offline and online phases. The offline phase is applied once to a given list of topics and involves generating a distribution of synthetic sentences for each topic using a GPT model and extracting anchor vectors. The online phase is applied to every call separately and scores the similarity between the transcripted conversation and the topic anchors found in the offline phase. Then, time domain analysis is applied to the similarity scores to group utterances into segments and tag them with topics. The proposed paradigm provides an accurate and efficient method for call segmentation and topic extraction that does not require labeled data, thus making it a versatile approach applicable to various domains. Our algorithm operates in production under Dynamics 365 Sales Conversation Intelligence, and our research is based on real sales conversations gathered from various Dynamics 365 Sales tenants.
    Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second. (arXiv:2306.07552v1 [cs.LG])
    We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and onboard sensing) is spawned in a home environment and asked to rearrange objects - by navigating to an object, picking it up, navigating to a target location, and then placing the object at the target location. Galactic is fast. In terms of simulation speed (rendering + physics), Galactic achieves over 421,000 steps-per-second (SPS) on an 8-GPU node, which is 54x faster than Habitat 2.0 (7699 SPS). More importantly, Galactic was designed to optimize the entire rendering + physics + RL interplay since any bottleneck in the interplay slows down training. In terms of simulation+RL speed (rendering + physics + inference + learning), Galactic achieves over 108,000 SPS, which 88x faster than Habitat 2.0 (1243 SPS). These massive speed-ups not only drastically cut the wall-clock training time of existing experiments, but also unlock an unprecedented scale of new experiments. First, Galactic can train a mobile pick skill to >80% accuracy in under 16 minutes, a 100x speedup compared to the over 24 hours it takes to train the same skill in Habitat 2.0. Second, we use Galactic to perform the largest-scale experiment to date for rearrangement using 5B steps of experience in 46 hours, which is equivalent to 20 years of robot experience. This scaling results in a single neural network composed of task-agnostic components achieving 85% success in GeometricGoal rearrangement, compared to 0% success reported in Habitat 2.0 for the same approach. The code is available at github.com/facebookresearch/galactic.
    Omega: Optimistic EMA Gradients. (arXiv:2306.07905v1 [cs.LG])
    Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training. Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
    Exact Solutions of a Deep Linear Network. (arXiv:2202.04777v7 [stat.ML] UPDATED)
    This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
    Privacy Preserving Bayesian Federated Learning in Heterogeneous Settings. (arXiv:2306.07959v1 [cs.LG])
    In several practical applications of federated learning (FL), the clients are highly heterogeneous in terms of both their data and compute resources, and therefore enforcing the same model architecture for each client is very limiting. Moreover, the need for uncertainty quantification and data privacy constraints are often particularly amplified for clients that have limited local data. This paper presents a unified FL framework to simultaneously address all these constraints and concerns, based on training customized local Bayesian models that learn well even in the absence of large local datasets. A Bayesian framework provides a natural way of incorporating supervision in the form of prior distributions. We use priors in the functional (output) space of the networks to facilitate collaboration across heterogeneous clients. Moreover, formal differential privacy guarantees are provided for this framework. Experiments on standard FL datasets demonstrate that our approach outperforms strong baselines in both homogeneous and heterogeneous settings and under strict privacy constraints, while also providing characterizations of model uncertainties.
    One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning. (arXiv:2306.07967v1 [cs.LG])
    We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adjusts to new tasks through additional dimensions on weights and activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured benchmarks, achieving superior accuracy with fewer parameters and computations on various datasets. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code is available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.
    Decentralized Hyper-Gradient Computation over Time-Varying Directed Networks. (arXiv:2210.02129v3 [stat.ML] UPDATED)
    This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL). Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters. In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks. To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients. We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks. As a result, the hyper-gradient estimator derived from our optimality condition enjoys two desirable properties; (i) it only requires Push-Sum communication of vectors and (ii) it can operate over time-varying directed networks. We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically, and we further demonstrate that it enables two novel applications: decentralized influence estimation and personalization over time-varying networks.
    Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms. (arXiv:2001.02879v2 [cs.LG] UPDATED)
    In this paper, we propose an adaptive stopping rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stopping strategy. We analyze the performance of the adaptive stopping rule in the framework of learning theory. Using the recently developed integral operator approach, we rigorously prove the optimality of the adaptive stopping rule in terms of showing the optimal learning rates for KGD equipped with this rule. Furthermore, a sharp bound on the number of iterations in KGD equipped with the proposed early stopping rule is also given to demonstrate its computational advantage.
    Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets. (arXiv:2306.07709v1 [cs.GT])
    In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal coalition welfare and discuss bidders' incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.
    Learning Unnormalized Statistical Models via Compositional Optimization. (arXiv:2306.07485v1 [cs.LG])
    Learning unnormalized statistical models (e.g., energy-based models) is computationally challenging due to the complexity of handling the partition function. To eschew this complexity, noise-contrastive estimation~(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. However, as found in previous works, NCE may perform poorly in many tasks due to its flat loss landscape and slow convergence. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models from the perspective of compositional optimization. To tackle the partition function, a noise distribution is introduced such that the log partition function can be written as a compositional function whose inner function can be estimated with stochastic samples. Hence, the objective can be optimized by stochastic compositional optimization algorithms. Despite being a simple method, we demonstrate that it is more favorable than NCE by (1) establishing a fast convergence rate and quantifying its dependence on the noise distribution through the variance of stochastic estimators; (2) developing better results for one-dimensional Gaussian mean estimation by showing our objective has a much favorable loss landscape and hence our method enjoys faster convergence; (3) demonstrating better performance on multiple applications, including density estimation, out-of-distribution detection, and real image generation.
    Deep Demixing: Reconstructing the Evolution of Network Epidemics. (arXiv:2306.07938v1 [cs.SI])
    We propose the deep demixing (DDmix) model, a graph autoencoder that can reconstruct epidemics evolving over networks from partial or aggregated temporal information. Assuming knowledge of the network topology but not of the epidemic model, our goal is to estimate the complete propagation path of a disease spread. A data-driven approach is leveraged to overcome the lack of model awareness. To solve this inverse problem, DDmix is proposed as a graph conditional variational autoencoder that is trained from past epidemic spreads. DDmix seeks to capture key aspects of the underlying (unknown) spreading dynamics in its latent space. Using epidemic spreads simulated in synthetic and real-world networks, we demonstrate the accuracy of DDmix by comparing it with multiple (non-graph-aware) learning algorithms. The generalizability of DDmix is highlighted across different types of networks. Finally, we showcase that a simple post-processing extension of our proposed method can help identify super-spreaders in the reconstructed propagation path.
    Bag of Image Patch Embedding Behind the Success of Self-Supervised Learning. (arXiv:2206.08954v2 [cs.CV] UPDATED)
    Self-supervised learning (SSL) has recently achieved tremendous empirical advancements in learning image representation. However, our understanding of the principle behind learning such a representation is still limited. This work shows that joint-embedding SSL approaches primarily learn a representation of image patches, which reflects their co-occurrence. Such a connection to co-occurrence modeling can be established formally, and it supplements the prevailing invariance perspective. We empirically show that learning a representation for fixed-scale patches and aggregating local patch representations as the image representation achieves similar or even better results than the baseline methods. We denote this process as BagSSL. Even with 32x32 patch representation, BagSSL achieves 62% top-1 linear probing accuracy on ImageNet. On the other hand, with a multi-scale pretrained model, we show that the whole image embedding is approximately the average of local patch embeddings. While the SSL representation is relatively invariant at the global scale, we show that locality is preserved when we zoom into local patch-level representation. Further, we show that patch representation aggregation can improve various SOTA baseline methods by a large margin. The patch representation is considerably easier to understand, and this work makes a step to demystify self-supervised representation learning.
    Vector-Quantized Graph Auto-Encoder. (arXiv:2306.07735v1 [cs.LG])
    In this work, we addresses the problem of modeling distributions of graphs. We introduce the Vector-Quantized Graph Auto-Encoder (VQ-GAE), a permutation-equivariant discrete auto-encoder and designed to model the distribution of graphs. By exploiting the permutation-equivariance of graph neural networks (GNNs), our autoencoder circumvents the problem of the ordering of the graph representation. We leverage the capability of GNNs to capture local structures of graphs while employing vector-quantization to prevent the mapping of discrete objects to a continuous latent space. Furthermore, the use of autoregressive models enables us to capture the global structure of graphs via the latent representation. We evaluate our model on standard datasets used for graph generation and observe that it achieves excellent performance on some of the most salient evaluation metrics compared to the state-of-the-art.
    Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective. (arXiv:2306.07528v1 [cs.LG])
    Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we unified the ranking process under general stochastic click models as a Markov Decision Process (MDP), and the optimal ranking could be learned with offline reinforcement learning (RL) directly. Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models. Through a dedicated formulation of the MDP, we show that offline RL algorithms can adapt to various click models without complex debiasing techniques and prior knowledge of the model. Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms while maintaining consistency and robustness under different click models.
    Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances. (arXiv:2306.07549v1 [cs.LG])
    We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.
    StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. (arXiv:2306.07691v1 [eess.AS])
    In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style diffusion and adversarial training with large speech language models (SLMs) to achieve human-level TTS synthesis. StyleTTS 2 differs from its predecessor by modeling styles as a latent random variable through diffusion models to generate the most suitable style for the text without requiring reference speech, achieving efficient latent diffusion while benefiting from the diverse speech synthesis offered by diffusion models. Furthermore, we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training, resulting in improved speech naturalness. StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches it on the multispeaker VCTK dataset as judged by native English speakers. Moreover, when trained on the LibriTTS dataset, our model outperforms previous publicly available models for zero-shot speaker adaptation. This work achieves the first human-level TTS on both single and multispeaker datasets, showcasing the potential of style diffusion and adversarial training with large SLMs. The audio demos and source code are available at https://styletts2.github.io/.
    GeneCIS: A Benchmark for General Conditional Image Similarity. (arXiv:2306.07969v1 [cs.CV])
    We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States. Project page at https://sgvaze.github.io/genecis/.
    Multi-modal Representation Learning for Social Post Location Inference. (arXiv:2306.07935v1 [cs.CL])
    Inferring geographic locations via social posts is essential for many practical location-based applications such as product marketing, point-of-interest recommendation, and infector tracking for COVID-19. Unlike image-based location retrieval or social-post text embedding-based location inference, the combined effect of multi-modal information (i.e., post images, text, and hashtags) for social post positioning receives less attention. In this work, we collect real datasets of social posts with images, texts, and hashtags from Instagram and propose a novel Multi-modal Representation Learning Framework (MRLF) capable of fusing different modalities of social posts for location inference. MRLF integrates a multi-head attention mechanism to enhance location-salient information extraction while significantly improving location inference compared with single domain-based methods. To overcome the noisy user-generated textual content, we introduce a novel attention-based character-aware module that considers the relative dependencies between characters of social post texts and hashtags for flexible multi-model information fusion. The experimental results show that MRLF can make accurate location predictions and open a new door to understanding the multi-modal data of social posts for online inference tasks.
    Rethinking Adversarial Training with A Simple Baseline. (arXiv:2306.07613v1 [cs.CV])
    We report competitive results on RobustBench for CIFAR and SVHN using a simple yet effective baseline approach. Our approach involves a training protocol that integrates rescaled square loss, cyclic learning rates, and erasing-based data augmentation. The outcomes we have achieved are comparable to those of the model trained with state-of-the-art techniques, which is currently the predominant choice for adversarial training. Our baseline, referred to as SimpleAT, yields three novel empirical insights. (i) By switching to square loss, the accuracy is comparable to that obtained by using both de-facto training protocol plus data augmentation. (ii) One cyclic learning rate is a good scheduler, which can effectively reduce the risk of robust overfitting. (iii) Employing rescaled square loss during model training can yield a favorable balance between adversarial and natural accuracy. In general, our experimental results show that SimpleAT effectively mitigates robust overfitting and consistently achieves the best performance at the end of training. For example, on CIFAR-10 with ResNet-18, SimpleAT achieves approximately 52% adversarial accuracy against the current strong AutoAttack. Furthermore, SimpleAT exhibits robust performance on various image corruptions, including those commonly found in CIFAR-10-C dataset. Finally, we assess the effectiveness of these insights through two techniques: bias-variance analysis and logit penalty methods. Our findings demonstrate that all of these simple techniques are capable of reducing the variance of model predictions, which is regarded as the primary contributor to robust overfitting. In addition, our analysis also uncovers connections with various advanced state-of-the-art methods.
    The Dormant Neuron Phenomenon in Deep Reinforcement Learning. (arXiv:2302.12902v2 [cs.LG] UPDATED)
    In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
    A Simple Unified Uncertainty-Guided Framework for Offline-to-Online Reinforcement Learning. (arXiv:2306.07541v1 [cs.LG])
    Offline reinforcement learning (RL) provides a promising solution to learning an agent fully relying on a data-driven paradigm. However, constrained by the limited quality of the offline dataset, its performance is often sub-optimal. Therefore, it is desired to further finetune the agent via extra online interactions before deployment. Unfortunately, offline-to-online RL can be challenging due to two main challenges: constrained exploratory behavior and state-action distribution shift. To this end, we propose a Simple Unified uNcertainty-Guided (SUNG) framework, which naturally unifies the solution to both challenges with the tool of uncertainty. Specifically, SUNG quantifies uncertainty via a VAE-based state-action visitation density estimator. To facilitate efficient exploration, SUNG presents a practical optimistic exploration strategy to select informative actions with both high value and high uncertainty. Moreover, SUNG develops an adaptive exploitation method by applying conservative offline RL objectives to high-uncertainty samples and standard online RL objectives to low-uncertainty samples to smoothly bridge offline and online stages. SUNG achieves state-of-the-art online finetuning performance when combined with different offline RL methods, across various environments and datasets in D4RL benchmark.
    Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language Models -- and Disappeared in GPT-4. (arXiv:2306.07622v1 [cs.CL])
    Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs, most notably GPT-3, exhibit behavior that strikingly resembles human-like intuition -- and the cognitive errors that come with it. However, LLMs with higher cognitive capabilities, in particular ChatGPT and GPT-4, learned to avoid succumbing to these errors and perform in a hyperrational manner. For our experiments, we probe LLMs with the Cognitive Reflection Test (CRT) as well as semantic illusions that were originally designed to investigate intuitive decision-making in humans. Moreover, we probe how sturdy the inclination for intuitive-like decision-making is. Our study demonstrates that investigating LLMs with methods from psychology has the potential to reveal otherwise unknown emergent traits.
    Hand Gesture Recognition through Reflected Infrared Light Wave Signals. (arXiv:2301.05955v2 [eess.SP] UPDATED)
    In this study, we present a wireless (non-contact) gesture recognition method using only incoherent light wave signals reflected from a human subject. In comparison to existing radar, light shadow, sound and camera-based sensing systems, this technology uses a low-cost ubiquitous light source (e.g., infrared LED) to send light towards the subject's hand performing gestures and the reflected light is collected by a light sensor (e.g., photodetector). This light wave sensing system recognizes different gestures from the variations of the received light intensity within a 20-35cm range. The hand gesture recognition results demonstrate up to 96% accuracy on average. The developed system can be utilized in numerous Human-computer Interaction (HCI) applications as a low-cost and non-contact gesture recognition technology.
    Differential Privacy with Random Projections and Sign Random Projections. (arXiv:2306.01751v2 [cs.CR] UPDATED)
    In this paper, we develop a series of differential privacy (DP) algorithms from a family of random projections (RP) for general applications in machine learning, data mining, and information retrieval. Among the presented algorithms, iDP-SignRP is remarkably effective under the setting of ``individual differential privacy'' (iDP), based on sign random projections (SignRP). Also, DP-SignOPORP considerably improves existing algorithms in the literature under the standard DP setting, using ``one permutation + one random projection'' (OPORP), where OPORP is a variant of the celebrated count-sketch method with fixed-length binning and normalization. Without taking signs, among the DP-RP family, DP-OPORP achieves the best performance. Our key idea for improving DP-RP is to take only the signs, i.e., $sign(x_j) = sign\left(\sum_{i=1}^p u_i w_{ij}\right)$, of the projected data. The intuition is that the signs often remain unchanged when the original data ($u$) exhibit small changes (according to the ``neighbor'' definition in DP). In other words, the aggregation and quantization operations themselves provide good privacy protections. We develop a technique called ``smooth flipping probability'' that incorporates this intuitive privacy benefit of SignRPs and improves the standard DP bit flipping strategy. Based on this technique, we propose DP-SignOPORP which satisfies strict DP and outperforms other DP variants based on SignRP (and RP), especially when $\epsilon$ is not very large (e.g., $\epsilon = 5\sim10$). Moreover, if an application scenario accepts individual DP, then we immediately obtain an algorithm named iDP-SignRP which achieves excellent utilities even at small~$\epsilon$ (e.g., $\epsilon<0.5$).
    Numerical Methods For PDEs Over Manifolds Using Spectral Physics Informed Neural Networks. (arXiv:2302.05322v2 [cs.LG] UPDATED)
    We introduce an approach for solving PDEs over manifolds using physics informed neural networks whose architecture aligns with spectral methods. The networks are trained to take in as input samples of an initial condition, a time stamp and point(s) on the manifold and then output the solution's value at the given time and point(s). We provide proofs of our method for the heat equation on the interval and examples of unique network architectures that are adapted to nonlinear equations on the sphere and the torus. We also show that our spectral-inspired neural network architectures outperform the standard physics informed architectures. Our extensive experimental results include generalization studies where the testing dataset of initial conditions is randomly sampled from a significantly larger space than the training set.
    Effective control of two-dimensional Rayleigh--B\'enard convection: invariant multi-agent reinforcement learning is all you need. (arXiv:2304.02370v2 [physics.flu-dyn] UPDATED)
    Rayleigh-B\'enard convection (RBC) is a recurrent phenomenon in several industrial and geoscience flows and a well-studied system from a fundamental fluid-mechanics viewpoint. However, controlling RBC, for example by modulating the spatial distribution of the bottom-plate heating in the canonical RBC configuration, remains a challenging topic for classical control-theory methods. In the present work, we apply deep reinforcement learning (DRL) for controlling RBC. We show that effective RBC control can be obtained by leveraging invariant multi-agent reinforcement learning (MARL), which takes advantage of the locality and translational invariance inherent to RBC flows inside wide channels. The MARL framework applied to RBC allows for an increase in the number of control segments without encountering the curse of dimensionality that would result from a naive increase in the DRL action-size dimension. This is made possible by the MARL ability for re-using the knowledge generated in different parts of the RBC domain. We show in a case study that MARL DRL is able to discover an advanced control strategy that destabilizes the spontaneous RBC double-cell pattern, changes the topology of RBC by coalescing adjacent convection cells, and actively controls the resulting coalesced cell to bring it to a new stable configuration. This modified flow configuration results in reduced convective heat transfer, which is beneficial in several industrial processes. Therefore, our work both shows the potential of MARL DRL for controlling large RBC systems, as well as demonstrates the possibility for DRL to discover strategies that move the RBC configuration between different topological configurations, yielding desirable heat-transfer characteristics. These results are useful for both gaining further understanding of the intrinsic properties of RBC, as well as for developing industrial applications.
    FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction. (arXiv:2305.02549v2 [cs.CL] UPDATED)
    The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss. The graph contrastive objective maximizes the agreement of multimodal representations, providing a natural interplay for all modalities without special customization. In addition, we extract image features within the bounding box that joins a pair of tokens connected by a graph edge, capturing more targeted visual cues without loading a sophisticated and separately pre-trained image embedder. FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIE and Payment benchmarks with a more compact model size.
    Robustly Learning a Single Neuron via Sharpness. (arXiv:2306.07892v1 [cs.LG])
    We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including ReLUs, approximates the optimal $L_2^2$-error within a constant factor. Our algorithm applies under much milder distributional assumptions compared to prior work. The key ingredient enabling our results is a novel connection to local error bounds from optimization theory.
    Offline Policy Evaluation and Optimization under Confounding. (arXiv:2211.16583v3 [stat.ML] UPDATED)
    Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but can also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on their time-evolution and effect on the data-collection policies. We determine when consistent value estimates are not achievable, providing and discussing algorithms to estimate lower bounds with guarantees in those cases. When consistent estimates are achievable, we provide sample complexity guarantees. We also present new algorithms for offline policy improvement and prove local convergence guarantees. Finally, we experimentally evaluate our algorithms on gridworld and a simulated healthcare setting of managing sepsis patients. We note that in gridworld, our model-based method provides tighter lower bounds than existing methods, while in the sepsis simulator, our methods significantly outperform confounder-oblivious benchmarks.
    Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations. (arXiv:2303.09289v2 [cs.LG] UPDATED)
    Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (CAIA), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that CAIA can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender, and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.
    Variational Positive-incentive Noise: How Noise Benefits Models. (arXiv:2306.07651v1 [cs.LG])
    A large number of works aim to alleviate the impact of noise due to an underlying conventional assumption of the negative role of noise. However, some existing works show that the assumption does not always hold. In this paper, we investigate how to benefit the classical models by random noise under the framework of Positive-incentive Noise (Pi-Noise). Since the ideal objective of Pi-Noise is intractable, we propose to optimize its variational bound instead, namely variational Pi-Noise (VPN). With the variational inference, a VPN generator implemented by neural networks is designed for enhancing base models and simplifying the inference of base models, without changing the architecture of base models. Benefiting from the independent design of base models and VPN generators, the VPN generator can work with most existing models. From the experiments, it is shown that the proposed VPN generator can improve the base models. It is appealing that the trained variational VPN generator prefers to blur the irrelevant ingredients in complicated images, which meets our expectations.
    eP-ALM: Efficient Perceptual Augmentation of Language Models. (arXiv:2303.11403v2 [cs.CV] UPDATED)
    Large Language Models (LLMs) have so far impressed the world, with unprecedented capabilities that emerge in models at large scales. On the vision side, transformer models (i.e., ViT) are following the same trend, achieving the best performance on challenging benchmarks. With the abundance of such unimodal models, a natural question arises; do we need also to follow this trend to tackle multimodal tasks? In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency. In particular, they still train a large number of parameters, rely on large multimodal pretraining, use encoders (e.g., CLIP) trained on huge image-text datasets, and add significant inference overhead. In addition, most of these approaches have focused on Zero-Shot and In Context Learning, with little to no effort on direct finetuning. We investigate the minimal computational effort needed to adapt unimodal models for multimodal tasks and propose a new challenging setup, alongside different approaches, that efficiently adapts unimodal pretrained models. We show that by freezing more than 99\% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning across Image, Video, and Audio modalities, following the proposed setup. The code will be available here: https://github.com/mshukor/eP-ALM.
    Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation. (arXiv:2211.01939v2 [cs.LG] UPDATED)
    We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estimation under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.
    Improving the Validity of Decision Trees as Explanations. (arXiv:2306.06777v2 [cs.LG] UPDATED)
    In classification and forecasting with tabular data, one often utilizes tree-based models. This can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Here, we train a low-depth tree with the objective of minimising the maximum misclassification error across each leaf node, and then ``suspend'' further tree-based models (e.g., trees of unlimited depth) from each leaf of the low-depth tree. The low-depth tree is easily explainable, while the overall statistical performance of the combined low-depth and suspended tree-based models improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).
    Density-Softmax: Scalable and Calibrated Uncertainty Estimation under Distribution Shifts. (arXiv:2302.06495v2 [cs.LG] UPDATED)
    Prevalent deterministic deep-learning models suffer from significant over-confidence under distribution shifts. Probabilistic approaches can reduce this problem but struggle with computational efficiency. In this paper, we propose Density-Softmax, a fast and lightweight deterministic method to improve calibrated uncertainty estimation via a combination of density function with the softmax layer. By using the latent representation's likelihood value, our approach produces more uncertain predictions when test samples are distant from the training samples. Theoretically, we show that Density-Softmax can produce high-quality uncertainty estimation with neural networks, as it is the solution of minimax uncertainty risk and is distance-aware, thus reducing the over-confidence of the standard softmax. Empirically, our method enjoys similar computational efficiency as a single forward pass deterministic with standard softmax on the shifted toy, vision, and language datasets across modern deep-learning architectures. Notably, Density-Softmax uses 4 times fewer parameters than Deep Ensembles and 6 times lower latency than Rank-1 Bayesian Neural Network, while obtaining competitive predictive performance and lower calibration errors under distribution shifts.
    Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks. (arXiv:2303.15693v2 [cs.CV] UPDATED)
    Pretraining a deep learning model on large image datasets is a standard step before fine-tuning the model on small targeted datasets. The large dataset is usually general images (e.g. imagenet2012) while the small dataset can be specialized datasets that have different distributions from the large dataset. However, this 'large-to-small' strategy is not well-validated when the large dataset is specialized and has a similar distribution to small datasets. We newly compiled three hematoxylin and eosin-stained image datasets, one large (PTCGA200) and two magnification-adjusted small datasets (PCam200 and segPANDA200). Major deep learning models were trained with supervised and self-supervised learning methods and fine-tuned on the small datasets for tumor classification and tissue segmentation benchmarks. ResNet50 pretrained with MoCov2, SimCLR, and BYOL on PTCGA200 was better than imagenet2012 pretraining when fine-tuned on PTCGA200 (accuracy of 83.94%, 86.41%, 84.91%, and 82.72%, respectively). ResNet50 pre-trained on PTCGA200 with MoCov2 exceeded the COCOtrain2017-pretrained baseline and was the best in ResNet50 for the tissue segmentation benchmark (mIoU of 63.53% and 63.22%). We found re-training imagenet-pretrained models (ResNet50, BiT-M-R50x1, and ViT-S/16) on PTCGA200 improved downstream benchmarks.
    SqueezeLLM: Dense-and-Sparse Quantization. (arXiv:2306.07629v1 [cs.CL])
    Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.
    Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection. (arXiv:2210.09186v6 [cs.SI] UPDATED)
    The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. Here we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm, and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks, and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement.
    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon. (arXiv:2306.06283v2 [cond-mat.mtrl-sci] UPDATED)
    Chemistry and materials science are complex. Recently, there have been great successes in addressing this complexity using data-driven or computational techniques. Yet, the necessity of input structured in very specific forms and the fact that there is an ever-growing number of tools creates usability and accessibility challenges. Coupled with the reality that much data in these disciplines is unstructured, the effectiveness of these tools is limited. Motivated by recent works that indicated that large language models (LLMs) might help address some of these issues, we organized a hackathon event on the applications of LLMs in chemistry, materials science, and beyond. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.
    On Achieving Optimal Adversarial Test Error. (arXiv:2306.07544v1 [cs.LG])
    We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. By contrast, prior theoretical work either considered specialized data distributions or only provided training error guarantees.
    Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach. (arXiv:2110.04514v2 [cs.LG] UPDATED)
    We target open-world feature extrapolation problem where the feature space of input data goes through expansion and a model trained on partially observed features needs to handle new features in test data without further retraining. The problem is of much significance for dealing with features incrementally collected from different fields. To this end, we propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data. Based on our framework, we design two training strategies, a self-supervised approach and an inductive learning approach, to endow the model with extrapolation ability and alleviate feature-level over-fitting. We also provide theoretical analysis on the generalization error on test data with new features, which dissects the impact of training features and algorithms on generalization performance. Our experiments over several classification datasets and large-scale advertisement click prediction datasets demonstrate that our model can produce effective embeddings for unseen features and significantly outperforms baseline methods that adopt KNN and local aggregation.
    Interpretable Differencing of Machine Learning Models. (arXiv:2306.06473v2 [cs.LG] UPDATED)
    Understanding the differences between machine learning (ML) models is of interest in scenarios ranging from choosing amongst a set of competing models, to updating a deployed model with new training data. In these cases, we wish to go beyond differences in overall metrics such as accuracy to identify where in the feature space do the differences occur. We formalize this problem of model differencing as one of predicting a dissimilarity function of two ML models' outputs, subject to the representation of the differences being human-interpretable. Our solution is to learn a Joint Surrogate Tree (JST), which is composed of two conjoined decision tree surrogates for the two models. A JST provides an intuitive representation of differences and places the changes in the context of the models' decision logic. Context is important as it helps users to map differences to an underlying mental model of an AI system. We also propose a refinement procedure to increase the precision of a JST. We demonstrate, through an empirical evaluation, that such contextual differencing is concise and can be achieved with no loss in fidelity over naive approaches.
    Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis. (arXiv:2306.07664v1 [cs.CL])
    In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.
    Generated Graph Detection. (arXiv:2306.07758v1 [cs.CR])
    Graph generative models become increasingly effective for data distribution approximation and data augmentation. While they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake visual and auditory media has been delivering to society. Hence it is essential to regulate the prevalence of generated graphs. To tackle this problem, we pioneer the formulation of the generated graph detection problem to distinguish generated graphs from real ones. We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios. Each scenario switches between seen and unseen datasets/generators during testing to get closer to real-world settings and progressively challenge the classifiers. Extensive experiments evidence that all the models are qualified for generated graph detection, with specific models having advantages in specific scenarios. Resulting from the validated generality and oblivion of the classifiers to unseen datasets/generators, we draw a safe conclusion that our solution can sustain for a decent while to curb generated graph misuses.
    Automatic and Accurate Classification of Hotel Bathrooms from Images with Deep Learning. (arXiv:2306.07727v1 [cs.CV])
    Hotel bathrooms are one of the most important places in terms of customer satisfaction, and where the most complaints are reported. To share their experiences, guests rate hotels, comment, and share images of their positive or negative ratings. An important part of the room images shared by guests is related to bathrooms. Guests tend to prove their satisfaction or dissatisfaction with the bathrooms with images in their comments. These Positive or negative comments and visuals potentially affect the prospective guests. In this study, two different versions of a deep learning algorithm were designed to classify hotel bathrooms as satisfactory (good) or unsatisfactory (bad, when any defects such as dirtiness, deficiencies, malfunctions were present) by analyzing images. The best-performer between the two models was determined as a result of a series of extensive experimental studies. The models were trained for each of 144 combinations of 5 hyper-parameter sets with a data set containing more than 11 thousand bathroom images, specially created for this study. The "HotelBath" data set was shared also with the community with this study. Four different image sizes were taken into consideration: 128, 256, 512 and 1024 pixels in both directions. The classification performances of the models were measured with several metrics. Both algorithms showed very attractive performances even with many combinations of hyper-parameters. They can classify bathroom images with very high accuracy. Suh that the top algorithm achieved an accuracy of 92.4% and an AUC (area under the curve) score of 0.967. In addition, other metrics also proved the success...
    Distribution Free Prediction Sets for Node Classification. (arXiv:2211.14555v2 [stat.ML] UPDATED)
    Graph Neural Networks (GNNs) are able to achieve high classification accuracy on many important real world datasets, but provide no rigorous notion of predictive uncertainty. Quantifying the confidence of GNN models is difficult due to the dependence between datapoints induced by the graph structure. We leverage recent advances in conformal prediction to construct prediction sets for node classification in inductive learning scenarios. We do this by taking an existing approach for conformal classification that relies on \textit{exchangeable} data and modifying it by appropriately weighting the conformal scores to reflect the network structure. We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.
    A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning. (arXiv:2306.07465v1 [cs.LG])
    We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve $\widetilde{O}\left(\Delta^{1/4}T^{3/4}\right)$ regret when the degree of nonstationarity, as measured by total variation $\Delta$, is known, and $\widetilde{O}\left(\Delta^{1/5}T^{4/5}\right)$ regret when $\Delta$ is unknown, where $T$ is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.
    Collaborative Machine Learning Model Building with Families Using Co-ML. (arXiv:2304.05444v2 [cs.HC] UPDATED)
    Existing novice-friendly machine learning (ML) modeling tools center around a solo user experience, where a single user collects only their own data to build a model. However, solo modeling experiences limit valuable opportunities for encountering alternative ideas and approaches that can arise when learners work together; consequently, it often precludes encountering critical issues in ML around data representation and diversity that can surface when different perspectives are manifested in a group-constructed data set. To address this issue, we created Co-ML -- a tablet-based app for learners to collaboratively build ML image classifiers through an end-to-end, iterative model-building process. In this paper, we illustrate the feasibility and potential richness of collaborative modeling by presenting an in-depth case study of a family (two children 11 and 14-years-old working with their parents) using Co-ML in a facilitated introductory ML activity at home. We share the Co-ML system design and contribute a discussion of how using Co-ML in a collaborative activity enabled beginners to collectively engage with dataset design considerations underrepresented in prior work such as data diversity, class imbalance, and data quality. We discuss how a distributed collaborative process, in which individuals can take on different model-building responsibilities, provides a rich context for children and adults to learn ML dataset design.
    Using Collision Momentum in Deep Reinforcement Learning Based Adversarial Pedestrian Modeling. (arXiv:2306.07525v1 [cs.RO])
    Recent research in pedestrian simulation often aims to develop realistic behaviors in various situations, but it is challenging for existing algorithms to generate behaviors that identify weaknesses in automated vehicles' performance in extreme and unlikely scenarios and edge cases. To address this, specialized pedestrian behavior algorithms are needed. Current research focuses on realistic trajectories using social force models and reinforcement learning based models. However, we propose a reinforcement learning algorithm that specifically targets collisions and better uncovers unique failure modes of automated vehicle controllers. Our algorithm is efficient and generates more severe collisions, allowing for the identification and correction of weaknesses in autonomous driving algorithms in complex and varied scenarios.
    A Trio Neural Model for Dynamic Entity Relatedness Ranking. (arXiv:1808.08316v4 [cs.IR] UPDATED)
    Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings and an unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity-relations are very dynamic over time. In this work, we propose a neural networkbased approach for dynamic entity relatedness, leveraging the collective attention as supervision. Our model is capable of learning rich and different entity representations in a joint framework. Through extensive experiments on large-scale datasets, we demonstrate that our method achieves better results than competitive baselines.
    FewSOME: One-Class Few Shot Anomaly Detection with Siamese Networks. (arXiv:2301.06957v4 [cs.LG] UPDATED)
    Recent Anomaly Detection techniques have progressed the field considerably but at the cost of increasingly complex training pipelines. Such techniques require large amounts of training data, resulting in computationally expensive algorithms that are unsuitable for settings where only a small amount of normal samples are available for training. We propose 'Few Shot anOMaly detection' (FewSOME), a deep One-Class Anomaly Detection algorithm with the ability to accurately detect anomalies having trained on 'few' examples of the normal class and no examples of the anomalous class. We describe FewSOME to be of low complexity given its low data requirement and short training time. FewSOME is aided by pretrained weights with an architecture based on Siamese Networks. By means of an ablation study, we demonstrate how our proposed loss, 'Stop Loss', improves the robustness of FewSOME. Our experiments demonstrate that FewSOME performs at state-of-the-art level on benchmark datasets MNIST, CIFAR-10, F-MNIST and MVTec AD while training on only 30 normal samples, a minute fraction of the data that existing methods are trained on. Moreover, our experiments show FewSOME to be robust to contaminated datasets. We also report F1 score and balanced accuracy in addition to AUC as a benchmark for future techniques to be compared against. Code available; https://github.com/niamhbelton/FewSOME.
    Provably Learning Nash Policies in Constrained Markov Potential Games. (arXiv:2306.07749v1 [cs.LG])
    Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collisions (safety). Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs can be found via constrained optimization. One tempting approach is to solve it by Lagrangian-based primal-dual methods. As we show, in contrast to the single-agent setting, however, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To solve the CMPG problem, we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs, and, which under additional assumptions, guarantee safe exploration.
    Conjugate Natural Selection. (arXiv:2208.13898v4 [cs.LG] UPDATED)
    We prove that Fisher-Rao natural gradient descent (FR-NGD) optimally approximates the continuous time replicator equation (an essential model of evolutionary dynamics), and term this correspondence "conjugate natural selection". This correspondence promises alternative approaches for evolutionary computation over continuous or high-dimensional hypothesis spaces. As a special case, FR-NGD also provides the optimal approximation of continuous Bayesian inference when hypotheses compete on the basis of predicting actual observations. In this case, the method avoids the need to compute prior probabilities. We demonstrate our findings on a non-convex optimization problem and a system identification task for a stochastic process with time-varying parameters.
    Improving Opinion-based Question Answering Systems Through Label Error Detection and Overwrite. (arXiv:2306.07499v1 [cs.CL])
    Label error is a ubiquitous problem in annotated data. Large amounts of label error substantially degrades the quality of deep learning models. Existing methods to tackle the label error problem largely focus on the classification task, and either rely on task specific architecture or require non-trivial additional computations, which is undesirable or even unattainable for industry usage. In this paper, we propose LEDO: a model-agnostic and computationally efficient framework for Label Error Detection and Overwrite. LEDO is based on Monte Carlo Dropout combined with uncertainty metrics, and can be easily generalized to multiple tasks and data sets. Applying LEDO to an industry opinion-based question answering system demonstrates it is effective at improving accuracy in all the core models. Specifically, LEDO brings 1.1% MRR gain for the retrieval model, 1.5% PR AUC improvement for the machine reading comprehension model, and 0.9% rise in the Average Precision for the ranker, on top of the strong baselines with a large-scale social media dataset. Importantly, LEDO is computationally efficient compared to methods that require loss function change, and cost-effective as the resulting data can be used in the same continuous training pipeline for production. Further analysis shows that these gains come from an improved decision boundary after cleaning the label errors existed in the training data.
    Towards Fair and Explainable AI using a Human-Centered AI Approach. (arXiv:2306.07427v1 [cs.CY])
    The rise of machine learning (ML) is accompanied by several high-profile cases that have stressed the need for fairness, accountability, explainability and trust in ML systems. The existing literature has largely focused on fully automated ML approaches that try to optimize for some performance metric. However, human-centric measures like fairness, trust, explainability, etc. are subjective in nature, context-dependent, and might not correlate with conventional performance metrics. To deal with these challenges, we explore a human-centered AI approach that empowers people by providing more transparency and human control. In this dissertation, we present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings. The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers (crowd workers). Our study found that adding explanations supports trust calibration for the resulting ML model and enables rich forms of teaching feedback. The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in tabular datasets. Apart from fairness, we found that our tool also enhances trust and accountability. The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups, such as females, or subgroups, such as Black Muslim females. The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social biases in creative writing. Finally, the last project presents an empirical study aimed at understanding the cumulative impact of multiple fairness-enhancing interventions at different stages of the ML pipeline on fairness, utility and different population groups. We conclude by discussing some of the future directions.
    GQFedWAvg: Optimization-Based Quantized Federated Learning in General Edge Computing Systems. (arXiv:2306.07497v1 [cs.LG])
    The optimal implementation of federated learning (FL) in practical edge computing systems has been an outstanding problem. In this paper, we propose an optimization-based quantized FL algorithm, which can appropriately fit a general edge computing system with uniform or nonuniform computing and communication resources at the workers. Specifically, we first present a new random quantization scheme and analyze its properties. Then, we propose a general quantized FL algorithm, namely GQFedWAvg. Specifically, GQFedWAvg applies the proposed quantization scheme to quantize wisely chosen model update-related vectors and adopts a generalized mini-batch stochastic gradient descent (SGD) method with the weighted average local model updates in global model aggregation. Besides, GQFedWAvg has several adjustable algorithm parameters to flexibly adapt to the computing and communication resources at the server and workers. We also analyze the convergence of GQFedWAvg. Next, we optimize the algorithm parameters of GQFedWAvg to minimize the convergence error under the time and energy constraints. We successfully tackle the challenging non-convex problem using general inner approximation (GIA) and multiple delicate tricks. Finally, we interpret GQFedWAvg's function principle and show its considerable gains over existing FL algorithms using numerical results.
    Physics-Informed Neural Networks for Material Model Calibration from Full-Field Displacement Data. (arXiv:2212.07723v2 [cs.LG] UPDATED)
    The identification of material parameters occurring in constitutive models has a wide range of applications in practice. One of these applications is the monitoring and assessment of the actual condition of infrastructure buildings, as the material parameters directly reflect the resistance of the structures to external impacts. Physics-informed neural networks (PINNs) have recently emerged as a suitable method for solving inverse problems. The advantages of this method are a straightforward inclusion of observation data. Unlike grid-based methods, such as the least square finite element method (LS-FEM) approach, no computational grid and no interpolation of the data is required. In the current work, we propose PINNs for the calibration of constitutive models from full-field displacement and global force data in a realistic regime on the example of linear elasticity. We show that conditioning and reformulation of the optimization problem play a crucial role in real-world applications. Therefore, among others, we identify the material parameters from initial estimates and balance the individual terms in the loss function. In order to reduce the dependency of the identified material parameters on local errors in the displacement approximation, we base the identification not on the stress boundary conditions but instead on the global balance of internal and external work. We demonstrate that the enhanced PINNs are capable of identifying material parameters from both experimental one-dimensional data and synthetic full-field displacement data in a realistic regime. Since displacement data measured by, e.g., a digital image correlation (DIC) system is noisy, we additionally investigate the robustness of the method to different levels of noise.
    Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations. (arXiv:2306.07919v1 [cs.LG])
    Imitation learning has achieved great success in many sequential decision-making tasks, in which a neural agent is learned by imitating collected human demonstrations. However, existing algorithms typically require a large number of high-quality demonstrations that are difficult and expensive to collect. Usually, a trade-off needs to be made between demonstration quality and quantity in practice. Targeting this problem, in this work we consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set. Some pioneering works have been proposed, but they suffer from many limitations, e.g., assuming a demonstration to be of the same optimality throughout time steps and failing to provide any interpretation w.r.t knowledge learned from the noisy set. Addressing these problems, we propose {\method} by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills. Concretely, {\method} consists of a high-level controller to discover skills and a skill-conditioned module to capture action-taking policies, and is trained following a two-phase pipeline by first discovering skills with all demonstrations and then adapting the controller to only the clean set. A mutual-information-based regularization and a dynamic sub-demonstration optimality estimator are designed to promote disentanglement in the skill space. Extensive experiments are conducted over two gym environments and a real-world healthcare dataset to demonstrate the superiority of {\method} in learning from sub-optimal demonstrations and its improved interpretability by examining learned skills.
    Parting with Misconceptions about Learning-based Vehicle Motion Planning. (arXiv:2306.07962v1 [cs.RO])
    The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements. Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently. We further assess the current state of closed-loop planning in the field, revealing the limitations of learning-based methods in complex real-world scenarios and the value of simple rule-based priors such as centerline selection through lane graph search algorithms. More surprisingly, for the open-loop sub-task, we observe that the best results are achieved when using only this centerline as scene context (\ie, ignoring all information regarding the map and other agents). Combining these insights, we propose an extremely simple and efficient planner which outperforms an extensive set of competitors, winning the nuPlan planning challenge 2023.
    "Private Prediction Strikes Back!'' Private Kernelized Nearest Neighbors with Individual Renyi Filter. (arXiv:2306.07381v1 [cs.LG])
    Most existing approaches of differentially private (DP) machine learning focus on private training. Despite its many advantages, private training lacks the flexibility in adapting to incremental changes to the training dataset such as deletion requests from exercising GDPR's right to be forgotten. We revisit a long-forgotten alternative, known as private prediction, and propose a new algorithm named Individual Kernelized Nearest Neighbor (Ind-KNN). Ind-KNN is easily updatable over dataset changes and it allows precise control of the R\'{e}nyi DP at an individual user level -- a user's privacy loss is measured by the exact amount of her contribution to predictions; and a user is removed if her prescribed privacy budget runs out. Our results show that Ind-KNN consistently improves the accuracy over existing private prediction methods for a wide range of $\epsilon$ on four vision and language tasks. We also illustrate several cases under which Ind-KNN is preferable over private training with NoisySGD.
    Factor-augmented tree ensembles. (arXiv:2111.14000v6 [stat.ML] UPDATED)
    This manuscript proposes to extend the information set of time-series regression trees with latent stationary factors extracted via state-space methods. In doing so, this approach generalises time-series regression trees on two dimensions. First, it allows to handle predictors that exhibit measurement error, non-stationary trends, seasonality and/or irregularities such as missing observations. Second, it gives a transparent way for using domain-specific theory to inform time-series regression trees. Empirically, ensembles of these factor-augmented trees provide a reliable approach for macro-finance problems. This article highlights it focussing on the lead-lag effect between equity volatility and the business cycle in the United States.
    When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making. (arXiv:2011.06167v3 [cs.LG] UPDATED)
    As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making. For instance, communicating the uncertainty associated with model predictions could potentially be helpful in this regard. In this work, we carry out user studies (1,330 responses from 190 participants) to systematically assess how people with differing levels of expertise respond to different types of predictive uncertainty (i.e., posterior predictive distributions with different shapes and variances) in the context of ML assisted decision making for predicting apartment rental prices. We found that showing posterior predictive distributions led to smaller disagreements with the ML model's predictions, regardless of the shapes and variances of the posterior predictive distributions we considered, and that these effects may be sensitive to expertise in both ML and the domain. This suggests that posterior predictive distributions can potentially serve as useful decision aids which should be used with caution and take into account the type of distribution and the expertise of the human.
    Getting the Most from Eye-Tracking: User-Interaction Based Reading Region Estimation Dataset and Models. (arXiv:2306.07455v1 [cs.HC])
    A single digital newsletter usually contains many messages (regions). Users' reading time spent on, and read level (skip/skim/read-in-detail) of each message is important for platforms to understand their users' interests, personalize their contents, and make recommendations. Based on accurate but expensive-to-collect eyetracker-recorded data, we built models that predict per-region reading time based on easy-to-collect Javascript browser tracking data. With eye-tracking, we collected 200k ground-truth datapoints on participants reading news on browsers. Then we trained machine learning and deep learning models to predict message-level reading time based on user interactions like mouse position, scrolling, and clicking. We reached 27\% percentage error in reading time estimation with a two-tower neural network based on user interactions only, against the eye-tracking ground truth data, while the heuristic baselines have around 46\% percentage error. We also discovered the benefits of replacing per-session models with per-timestamp models, and adding user pattern features. We concluded with suggestions on developing message-level reading estimation techniques based on available data.
    Automating Microservices Test Failure Analysis using Kubernetes Cluster Logs. (arXiv:2306.07653v1 [cs.SE])
    Kubernetes is a free, open-source container orchestration system for deploying and managing Docker containers that host microservices. Kubernetes cluster logs help in determining the reason for the failure. However, as systems become more complex, identifying failure reasons manually becomes more difficult and time-consuming. This study aims to identify effective and efficient classification algorithms to automatically determine the failure reason. We compare five classification algorithms, Support Vector Machines, K-Nearest Neighbors, Random Forest, Gradient Boosting Classifier, and Multilayer Perceptron. Our results indicate that Random Forest produces good accuracy while requiring fewer computational resources than other algorithms.
    Bandit Quickest Changepoint Detection. (arXiv:2107.10492v3 [cs.LG] UPDATED)
    Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, where sensing actions (or sensors) are sequentially chosen, and only measurements corresponding to chosen actions are observed. We derive an information-theoretic lower bound on the detection delay for a general class of finitely parameterized probability distributions. We then propose a computationally efficient online sensing scheme, which seamlessly balances the need for exploration of different sensing options with exploitation of querying informative actions. We derive expected delay bounds for the proposed scheme and show that these bounds match our information-theoretic lower bounds at low false alarm rates, establishing optimality of the proposed method. We then perform a number of experiments on synthetic and real datasets demonstrating the effectiveness of our proposed method.
    V-LoL: A Diagnostic Dataset for Visual Logical Learning. (arXiv:2306.07743v1 [cs.AI])
    Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Trains, -- a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Trains provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even state-of-the-art AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations specific to each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems.
    Noisy Positive-Unlabeled Learning with Self-Training for Speculative Knowledge Graph Reasoning. (arXiv:2306.07512v1 [cs.LG])
    This paper studies speculative reasoning task on real-world knowledge graphs (KG) that contain both \textit{false negative issue} (i.e., potential true facts being excluded) and \textit{false positive issue} (i.e., unreliable or outdated facts being included). State-of-the-art methods fall short in the speculative reasoning ability, as they assume the correctness of a fact is solely determined by its presence in KG, making them vulnerable to false negative/positive issues. The new reasoning task is formulated as a noisy Positive-Unlabeled learning problem. We propose a variational framework, namely nPUGraph, that jointly estimates the correctness of both collected and uncollected facts (which we call \textit{label posterior}) and updates model parameters during training. The label posterior estimation facilitates speculative reasoning from two perspectives. First, it improves the robustness of a label posterior-aware graph encoder against false positive links. Second, it identifies missing facts to provide high-quality grounds of reasoning. They are unified in a simple yet effective self-training procedure. Empirically, extensive experiments on three benchmark KG and one Twitter dataset with various degrees of false negative/positive cases demonstrate the effectiveness of nPUGraph.
    End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning. (arXiv:2209.15449v2 [eess.AS] UPDATED)
    To train machine learning algorithms to predict emotional expressions in terms of arousal and valence, annotated datasets are needed. However, as different people perceive others' emotional expressions differently, their annotations are subjective. To account for this, annotations are typically collected from multiple annotators and averaged to obtain ground-truth labels. However, when exclusively trained on this averaged ground-truth, the model is agnostic to the inherent subjectivity in emotional expressions. In this work, we therefore propose an end-to-end Bayesian neural network capable of being trained on a distribution of annotations to also capture the subjectivity-based label uncertainty. Instead of a Gaussian, we model the annotation distribution using Student's t-distribution, which also accounts for the number of annotations available. We derive the corresponding Kullback-Leibler divergence loss and use it to train an estimator for the annotation distribution, from which the mean and uncertainty can be inferred. We validate the proposed method using two in-the-wild datasets. We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition, and also consistent results in cross-corpora evaluations. Furthermore, analyses reveal that the advantage of a t-distribution over a Gaussian grows with increasing inter-annotator correlation and a decreasing number of annotations available.
    DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills. (arXiv:2306.07419v1 [cs.RO])
    Quadruped animals seamlessly transition between gaits as they change locomotion speeds. While the most widely accepted explanation for gait transitions is energy efficiency, there is no clear consensus on the determining factor, nor on the potential effects from terrain properties. In this article, we propose that viability, i.e. the avoidance of falls, represents an important criterion for gait transitions. We investigate the emergence of gait transitions through the interaction between supraspinal drive (brain), the central pattern generator in the spinal cord, the body, and exteroceptive sensing by leveraging deep reinforcement learning and robotics tools. Consistent with quadruped animal data, we show that the walk-trot gait transition for quadruped robots on flat terrain improves both viability and energy efficiency. Furthermore, we investigate the effects of discrete terrain (i.e. crossing successive gaps) on imposing gait transitions, and find the emergence of trot-pronk transitions to avoid non-viable states. Compared with other potential criteria such as peak forces and energy efficiency, viability is the only improved factor after gait transitions on both flat and discrete gap terrains, suggesting that viability could be a primary and universal objective of gait transitions, while other criteria are secondary objectives and/or a consequence of viability. Moreover, we deploy our learned controller in sim-to-real hardware experiments and demonstrate state-of-the-art quadruped agility in challenging scenarios, where the Unitree A1 quadruped autonomously transitions gaits between trot and pronk to cross consecutive gaps of up to 30 cm (83.3 % of the body-length) at over 1.3 m/s.
    DAPPER: Label-Free Performance Estimation after Personalization for Heterogeneous Mobile Sensing. (arXiv:2111.11053v2 [cs.LG] UPDATED)
    Many applications utilize sensors in mobile devices and machine learning to provide novel services. However, various factors such as different users, devices, and environments impact the performance of such applications, thus making the domain shift (i.e., distributional shift between the training domain and the target domain) a critical issue in mobile sensing. Despite attempts in domain adaptation to solve this challenging problem, their performance is unreliable due to the complex interplay among diverse factors. In principle, the performance uncertainty can be identified and redeemed by performance validation with ground-truth labels. However, it is infeasible for every user to collect high-quality, sufficient labeled data. To address the issue, we present DAPPER (Domain AdaPtation Performance EstimatoR) that estimates the adaptation performance in a target domain with only unlabeled target data. Our key idea is to approximate the model performance based on the mutual information between the model inputs and corresponding outputs. Our evaluation with four real-world sensing datasets compared against six baselines shows that on average, DAPPER outperforms the state-of-the-art baseline by 39.8% in estimation accuracy. Moreover, our on-device experiment shows that DAPPER achieves up to 396X less computation overhead compared with the baselines.
    Robustness and Generalization Performance of Deep Learning Models on Cyber-Physical Systems: A Comparative Study. (arXiv:2306.07737v1 [cs.LG])
    Deep learning (DL) models have seen increased attention for time series forecasting, yet the application on cyber-physical systems (CPS) is hindered by the lacking robustness of these methods. Thus, this study evaluates the robustness and generalization performance of DL architectures on multivariate time series data from CPS. Our investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise, and assesses their impact on overall performance. Furthermore, we test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples. These include deviations from standard system operations, while the core dynamics of the underlying physical system are preserved. Additionally, we test how well the models respond to several data augmentation techniques, including added noise and time warping. Our experimental framework utilizes a simulated three-tank system, proposed as a novel benchmark for evaluating the robustness and generalization performance of DL algorithms in CPS data contexts. The findings reveal that certain DL model architectures and training techniques exhibit superior effectiveness in handling OOD samples and various perturbations. These insights have significant implications for the development of DL models that deliver reliable and robust performance in real-world CPS applications.
    Stochastic coordinate transformations with applications to robust machine learning. (arXiv:2110.01729v3 [stat.ML] UPDATED)
    In this paper we introduce a set of novel features for identifying underlying stochastic behavior of input data using the Karhunen-Loeve expansion. These novel features are constructed by applying a coordinate transformation based on the recent Functional Data Analysis theory for anomaly detection. The associated signal decomposition is an exact hierarchical tensor product expansion with known optimality properties for approximating stochastic processes (random fields) with finite dimensional function spaces. In principle these low dimensional spaces can capture most of the stochastic behavior of `underlying signals' in a given nominal class, and can reject signals in alternative classes as stochastic anomalies. Using a hierarchical finite dimensional expansion of the nominal class, a series of orthogonal nested subspaces is constructed for detecting anomalous signal components. Projection coefficients of input data in these subspaces are then used to train a Machine Learning (ML) classifier. However, due to the split of the signal into nominal and anomalous projection components, clearer separation surfaces of the classes arise. In fact we show that with a sufficiently accurate estimation of the covariance structure of the nominal class, a sharp classification can be obtained. This is particularly advantageous for situations with large unbalanced datasets. We formulate this concept and demonstrate it on a number of high-dimensional datasets. This approach yields significant increases in accuracy over ML methods that use the original feature data. Our tests on the Alzheimer's Disease ADNI dataset shows a dramatic increase in accuracy (from 48% to 89% accuracy). Furthermore, tests from unbalanced semi-synthetic datasets created from the GCM data confirmed increased accuracy as the dataset becomes more unbalanced.
    Value function estimation using conditional diffusion models for control. (arXiv:2306.07290v1 [cs.LG])
    A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative to address, sooner than later, the potential problem of running out of high-quality demonstrations. In this case, instead of collecting only new data via costly human demonstrations or risking a simulation-to-real transfer with uncertain effects, it would be beneficial to leverage vast amounts of readily-available low-quality data. Since classical control algorithms such as behavior cloning or temporal difference learning cannot be used on reward-free or action-free data out-of-the-box, this solution warrants novel training paradigms for continuous control. We propose a simple algorithm called Diffused Value Function (DVF), which learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. This model can be efficiently learned from state sequences (i.e., without access to reward functions nor actions), and subsequently used to estimate the value of each action out-of-the-box. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers, and show promising qualitative and quantitative results on challenging robotics benchmarks.
    Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages. (arXiv:2306.07744v1 [cs.SD])
    Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can limit alignment accuracy. In this paper, we use instead a contrastive learning procedure that derives cross-modal embeddings linking the audio and text domains. This way, we obtain a novel system that is simple to train end-to-end, can make use of weakly annotated training data, jointly learns a powerful text model, and is tailored to alignment. The system is not only the first to yield an average absolute error below 0.2 seconds on the standard Jamendo dataset but it is also robust to other languages, even when trained on English data only. Finally, we release word-level alignments for the JamendoLyrics Multi-Lang dataset.
    An extended physics informed neural network for preliminary analysis of parametric optimal control problems. (arXiv:2110.13530v2 [cs.LG] UPDATED)
    In this work we propose an extension of physics informed supervised learning strategies to parametric partial differential equations. Indeed, even if the latter are indisputably useful in many applications, they can be computationally expensive most of all in a real-time and many-query setting. Thus, our main goal is to provide a physics informed learning paradigm to simulate parametrized phenomena in a small amount of time. The physics information will be exploited in many ways, in the loss function (standard physics informed neural networks), as an augmented input (extra feature employment) and as a guideline to build an effective structure for the neural network (physics informed architecture). These three aspects, combined together, will lead to a faster training phase and to a more accurate parametric prediction. The methodology has been tested for several equations and also in an optimal control framework.
    Tight Memory-Regret Lower Bounds for Streaming Bandits. (arXiv:2306.07903v1 [cs.LG])
    In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory.
    How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control. (arXiv:2302.03791v2 [stat.ML] UPDATED)
    Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions, important questions remain on the reliability and trustworthiness of these sampling procedures for their responsible use in critical scenarios. Conformal prediction is a modern tool to construct finite-sample, distribution-free uncertainty guarantees for any black-box predictor. In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term $K$-RCPS, which allows to $(i)$ provide entrywise calibrated intervals for future samples of any diffusion model, and $(ii)$ control a certain notion of risk with respect to a ground truth image with minimal mean interval length. Differently from existing conformal risk control procedures, ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length. We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen, demonstrating state of the art performance.
    Area is all you need: repeatable elements make stronger adversarial attacks. (arXiv:2306.07768v1 [cs.CV])
    Over the last decade, deep neural networks have achieved state of the art in computer vision tasks. These models, however, are susceptible to unusual inputs, known as adversarial examples, that cause them to misclassify or otherwise fail to detect objects. Here, we provide evidence that the increasing success of adversarial attacks is primarily due to increasing their size. We then demonstrate a method for generating the largest possible adversarial patch by building a adversarial pattern out of repeatable elements. This approach achieves a new state of the art in evading detection by YOLOv2 and YOLOv3. Finally, we present an experiment that fails to replicate the prior success of several attacks published in this field, and end with some comments on testing and reproducibility.  ( 2 min )
    TART: A plug-and-play Transformer module for task-agnostic reasoning. (arXiv:2306.07536v1 [cs.LG])
    Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our analysis actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and propose TART which generically improves an LLM's reasoning abilities using a synthetically trained Transformer-based reasoning module. TART trains this reasoning module in a task-agnostic manner using only synthetic logistic regression tasks and composes it with an arbitrary real-world pre-trained model without any additional training. With a single inference module, TART improves performance across different model families (GPT-Neo, Pythia, BLOOM), model sizes (100M - 6B), tasks (14 NLP binary classification tasks), and even across different modalities (audio and vision). Additionally, on the RAFT Benchmark, TART improves GPT-Neo (125M)'s performance such that it outperforms BLOOM (176B), and is within 4% of GPT-3 (175B). Our code and models are available at https://github.com/HazyResearch/TART .  ( 2 min )
    Urban Spatiotemporal Data Synthesis via Neural Disaggregation. (arXiv:2306.07292v1 [cs.LG])
    The level of granularity of open data often conflicts the benefits it can provide. Less granular data can protect individual privacy, but to certain degrees, sabotage the promise of open data to promote transparency and assist research. Similar in the urban setting, aggregated urban data at high-level geographic units can mask out the underline particularities of city dynamics that may vary at lower areal levels. In this work, we aim to synthesize fine-grained, high resolution urban data, by breaking down aggregated urban data at coarse, low resolution geographic units. The goal is to increase the usability and realize the values as much as possible of highly aggregated urban data. To address the issue of simplicity of some traditional disaggregation methods -- 1) we experimented with numerous neural-based models that are capable of modeling intricate non-linear relationships among features. Neural methods can also leverage both spatial and temporal information concurrently. We showed that all neural methods perform better than traditional disaggregation methods. Incorporating the temporal information further enhances the results. 2) We proposed a training approach for disaggregation task, Chain-of-Training (COT), that can be incorporated into any of the training-based models. COT adds transitional disaggregation steps by incorporating intermediate geographic dimensions, which enhances the predictions at low geographic level and boosts the results at higher levels. 3) We adapted the idea of reconstruction (REC) from super-resolution domain in our disaggregation case -- after disaggregating from low to high geographic level, we then re-aggregate back to the low level from our generated high level values. Both strategies improved disaggregation results on three datasets and two cities we tested on.  ( 3 min )
    Powderworld: A Platform for Understanding Generalization via Rich Task Distributions. (arXiv:2211.13051v2 [cs.AI] UPDATED)
    One of the grand challenges of reinforcement learning is the ability to generalize to new tasks. However, general agents require a set of rich, diverse tasks to train on. Designing a `foundation environment' for such tasks is tricky -- the ideal environment would support a range of emergent phenomena, an expressive task space, and fast runtime. To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. Within Powderworld, two motivating challenges distributions are presented, one for world-modelling and one for reinforcement learning. Each contains hand-designed test tasks to examine generalization. Experiments indicate that increasing the environment's complexity improves generalization for world models and certain reinforcement learning agents, yet may inhibit learning in high-variance environments. Powderworld aims to support the study of generalization by providing a source of diverse tasks arising from the same core rules.  ( 2 min )
    Causal Mediation Analysis with Multi-dimensional and Indirectly Observed Mediators. (arXiv:2306.07918v1 [cs.LG])
    Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For example, we may want to identify how changes in brain activity or structure mediate an antidepressant's effect on behavior, but we may only have access to electrophysiological or imaging brain measurements. To date, most CMA methods assume that the mediator is one-dimensional and observable, which oversimplifies such real-world scenarios. To overcome this limitation, we introduce a CMA framework that can handle complex and indirectly observed mediators based on the identifiable variational autoencoder (iVAE) architecture. We prove that the true joint distribution over observed and latent variables is identifiable with the proposed method. Additionally, our framework captures a disentangled representation of the indirectly observed mediator and yields accurate estimation of the direct and mediated effects in synthetic and semi-synthetic experiments, providing evidence of its potential utility in real-world applications.
    Multiple Models for Recommending Temporal Aspects of Entities. (arXiv:1803.07890v3 [cs.IR] UPDATED)
    Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the most important factor in previous work. However, entity aspects are temporally dynamic and often driven by events happening over time. For such cases, aspect suggestion based solely on salience features can give unsatisfactory results, for two reasons. First, salience is often accumulated over a long time period and does not account for recency. Second, many aspects related to an event entity are strongly time-dependent. In this paper, we study the task of temporal aspect recommendation for a given entity, which aims at recommending the most relevant aspects and takes into account time in order to improve search experience. We propose a novel event-centric ensemble ranking method that learns from multiple time and type-dependent models and dynamically trades off salience and recency characteristics. Through extensive experiments on real-world query logs, we demonstrate that our method is robust and achieves better effectiveness than competitive baselines.  ( 2 min )
    Counting Markov Equivalent Directed Acyclic Graphs Consistent with Background Knowledge. (arXiv:2206.06744v2 [cs.DS] UPDATED)
    A polynomial-time exact algorithm for counting the number of directed acyclic graphs in a Markov equivalence class was recently given by Wien\"obst, Bannach, and Li\'skiewicz (AAAI 2021). In this paper, we consider the more general problem of counting the number of directed acyclic graphs in a Markov equivalence class when the directions of some of the edges are also fixed (this setting arises, for example, when interventional data is partially available). This problem has been shown in earlier work to be complexity-theoretically hard. In contrast, we show that the problem is nevertheless tractable in an interesting class of instances, by establishing that it is ``fixed-parameter tractable''. In particular, our counting algorithm runs in time that is bounded by a polynomial in the size of the graph, where the degree of the polynomial does \emph{not} depend upon the number of additional edges provided as input.
    Fixed points of arbitrarily deep 1-dimensional neural networks. (arXiv:2303.12814v2 [stat.ML] UPDATED)
    In this paper, we establish a sharp upper bound on the the number of fixed points a certain class of neural networks can have. The networks under study (autoencoders) can be viewed as discrete dynamical systems whose nonlinearities are given by the choice of activation functions. To this end, we introduce a new class $\mathcal{F}$ of $C^1$ activation functions that is closed under composition, and contains e.g. the logistic sigmoid function. We use this class to show that any 1-dimensional neural network of arbitrary depth with activation functions in $\mathcal{F}$ has at most three fixed points. Due to the simple nature of such networks, we are able to completely understand their fixed points, providing a foundation to the much needed connection between application and theory of deep neural networks.
    Hidden Biases of End-to-End Driving Models. (arXiv:2306.07957v1 [cs.CV])
    End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source of improvements is unclear. We identify two biases that recur in nearly all state-of-the-art methods and are critical for the observed progress on CARLA: (1) lateral recovery via a strong inductive bias towards target point following, and (2) longitudinal averaging of multimodal waypoint predictions for slowing down. We investigate the drawbacks of these biases and identify principled alternatives. By incorporating our insights, we develop TF++, a simple end-to-end method that ranks first on the Longest6 and LAV benchmarks, gaining 14 driving score over the best prior work on Longest6.
    Effects of Data Enrichment with Image Transformations on the Performance of Deep Networks. (arXiv:2306.07724v1 [cs.CV])
    Images cannot always be expected to come in a certain standard format and orientation. Deep networks need to be trained to take into account unexpected variations in orientation or format. For this purpose, training data should be enriched to include different conditions. In this study, the effects of data enrichment on the performance of deep networks in the super resolution problem were investigated experimentally. A total of six basic image transformations were used for the enrichment procedures. In the experiments, two deep network models were trained with variants of the ILSVRC2012 dataset enriched by these six image transformation processes. Considering a single image transformation, it has been observed that the data enriched with 180 degree rotation provides the best results. The most unsuccessful result was obtained when the models were trained on the enriched data generated by the flip upside down process. Models scored highest when trained with a mix of all transformations.  ( 2 min )
    Supervised-Contrastive Loss Learns Orthogonal Frames and Batching Matters. (arXiv:2306.07960v1 [cs.LG])
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy (CE) loss for classification. In this paper we ask: what differences in the learning process occur when the two different loss functions are being optimized? To answer this question, our main finding is that the geometry of embeddings learned by SCL forms an orthogonal frame (OF) regardless of the number of training examples per class. This is in contrast to the CE loss, for which previous work has shown that it learns embeddings geometries that are highly dependent on the class sizes. We arrive at our finding theoretically, by proving that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an OF. We then validate the model's prediction by conducting experiments with standard deep-learning models on benchmark vision datasets. Finally, our analysis and experiments reveal that the batching scheme chosen during SCL training plays a critical role in determining the quality of convergence to the OF geometry. This finding motivates a simple algorithm wherein the addition of a few binding examples in each batch significantly speeds up the occurrence of the OF geometry.  ( 2 min )
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v3 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We investigate the convergence numerically and present error estimates based on a stability result. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.  ( 2 min )
    Von Mises Mixture Distributions for Molecular Conformation Generation. (arXiv:2306.07472v1 [physics.chem-ph])
    Molecules are frequently represented as graphs, but the underlying 3D molecular geometry (the locations of the atoms) ultimately determines most molecular properties. However, most molecules are not static and at room temperature adopt a wide variety of geometries or $\textit{conformations}$. The resulting distribution on geometries $p(x)$ is known as the Boltzmann distribution, and many molecular properties are expectations computed under this distribution. Generating accurate samples from the Boltzmann distribution is therefore essential for computing these expectations accurately. Traditional sampling-based methods are computationally expensive, and most recent machine learning-based methods have focused on identifying $\textit{modes}$ in this distribution rather than generating true $\textit{samples}$. Generating such samples requires capturing conformational variability, and it has been widely recognized that the majority of conformational variability in molecules arises from rotatable bonds. In this work, we present VonMisesNet, a new graph neural network that captures conformational variability via a variational approximation of rotatable bond torsion angles as a mixture of von Mises distributions. We demonstrate that VonMisesNet can generate conformations for arbitrary molecules in a way that is both physically accurate with respect to the Boltzmann distribution and orders of magnitude faster than existing sampling methods.  ( 2 min )
    Attention-based Modeling of Physical Systems: Improved Latent Representations. (arXiv:2210.11269v5 [cs.LG] UPDATED)
    We propose attention-based modeling of quantities at arbitrary spatial points conditioned on related measurements at different locations. Our approach adapts a transformer-encoder to process measurements and read-out positions together. Attention-based models exhibit excellent performance across domains, which makes them an interesting candidate for modeling data irregularly sampled in space. We introduce a novel encoding strategy that applies the same transformation to the measurements and read-out positions, after which they are combined with encoded measurement values instead of relying on two different mappings. Efficiently learning input-output mappings from irregularly-spaced data is a fundamental challenge in modeling physical phenomena. To evaluate the effectiveness of our model, we conduct experiments on diverse problem domains, including high-altitude wind nowcasting, two-days weather forecasting, fluid dynamics, and heat diffusion. Our attention-based model consistently outperforms state-of-the-art models, such as Graph Element Networks and Conditional Neural Processes, for modeling irregularly sampled data. Notably, our model reduces root mean square error (RMSE) for wind nowcasting, improving from 9.24 to 7.98 and for a heat diffusion task from .126 to .084. We hypothesize that this superior performance can be attributed to the enhanced flexibility of our latent representation and the improved data encoding technique. To support our hypothesis, we design a synthetic experiment that reveals excessive bottlenecking in the latent representations of alternative models, which hinders information utilization and impedes training.  ( 3 min )
    3D molecule generation by denoising voxel grids. (arXiv:2306.07473v1 [cs.LG])
    We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework [Saremi and Hyvarinen, 2019] and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the ``clean'' molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (i.e., diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. VoxMol achieves comparable results to state of the art on unconditional 3D molecule generation while being simpler to train and faster to generate molecules.  ( 2 min )
    ATT3D: Amortized Text-to-3D Object Synthesis. (arXiv:2306.07349v1 [cs.LG])
    Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.  ( 2 min )
    Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes. (arXiv:2306.07741v1 [cs.LG])
    Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance, and tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on their ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the difference of performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments.  ( 2 min )
    Symmetry & Critical Points for Symmetric Tensor Decompositions Problems. (arXiv:2306.07886v1 [math.OC])
    We consider the non-convex optimization problem associated with the decomposition of a real symmetric tensor into a sum of rank one terms. Use is made of the rich symmetry structure to derive Puiseux series representations of families of critical points, and so obtain precise analytic estimates on the critical values and the Hessian spectrum. The sharp results make possible an analytic characterization of various geometric obstructions to local optimization methods, revealing in particular a complex array of saddles and local minima which differ by their symmetry, structure and analytic properties. A desirable phenomenon, occurring for all critical points considered, concerns the index of a point, i.e., the number of negative Hessian eigenvalues, increasing with the value of the objective function. Lastly, a Newton polytope argument is used to give a complete enumeration of all critical points of fixed symmetry, and it is shown that contrarily to the set of global minima which remains invariant under different choices of tensor norms, certain families of non-global minima emerge, others disappear.
    Partial Identification of Dose Responses with Hidden Confounders. (arXiv:2204.11206v3 [stat.ME] UPDATED)
    Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables -- causal parents of both the treatment and the outcome -- are included as covariates. Unfortunately, given observational data alone, we cannot know with certainty that this criterion is satisfied. Sensitivity analyses provide principled ways to give bounds on causal estimates when confounding variables are hidden. While much attention is focused on sensitivity analyses for discrete-valued treatments, much less is paid to continuous-valued treatments. We present novel methodology to bound both average and conditional average continuous-valued treatment-effect estimates when they cannot be point identified due to hidden confounding. A semi-synthetic benchmark on multiple datasets shows our method giving tighter coverage of the true dose-response curve than a recently proposed continuous sensitivity model and baselines. Finally, we apply our method to a real-world observational case study to demonstrate the value of identifying dose-dependent causal effects.
    Time-aware Graph Structure Learning via Sequence Prediction on Temporal Graphs. (arXiv:2306.07699v1 [cs.LG])
    Temporal Graph Learning, which aims to model the time-evolving nature of graphs, has gained increasing attention and achieved remarkable performance recently. However, in reality, graph structures are often incomplete and noisy, which hinders temporal graph networks (TGNs) from learning informative representations. Graph contrastive learning uses data augmentation to generate plausible variations of existing data and learn robust representations. However, rule-based augmentation approaches may be suboptimal as they lack learnability and fail to leverage rich information from downstream tasks. To address these issues, we propose a Time-aware Graph Structure Learning (TGSL) approach via sequence prediction on temporal graphs, which learns better graph structures for downstream tasks through adding potential temporal edges. In particular, it predicts time-aware context embedding based on previously observed interactions and uses the Gumble-Top-K to select the closest candidate edges to this context embedding. Additionally, several candidate sampling strategies are proposed to ensure both efficiency and diversity. Furthermore, we jointly learn the graph structure and TGNs in an end-to-end manner and perform inference on the refined graph. Extensive experiments on temporal link prediction benchmarks demonstrate that TGSL yields significant gains for the popular TGNs such as TGAT and GraphMixer, and it outperforms other contrastive learning methods on temporal graphs. We will release the code in the future.  ( 2 min )
    Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation. (arXiv:2305.02231v2 [cs.CY] UPDATED)
    Trustworthy Artificial Intelligence (AI) is based on seven technical requirements sustained over three main pillars that should be met throughout the system's entire life cycle: it should be (1) lawful, (2) ethical, and (3) robust, both from a technical and a social perspective. However, attaining truly trustworthy AI concerns a wider vision that comprises the trustworthiness of all processes and actors that are part of the system's life cycle, and considers previous aspects from different lenses. A more holistic vision contemplates four essential axes: the global principles for ethical use and development of AI-based systems, a philosophical take on AI ethics, a risk-based approach to AI regulation, and the mentioned pillars and requirements. The seven requirements (human agency and oversight; robustness and safety; privacy and data governance; transparency; diversity, non-discrimination and fairness; societal and environmental wellbeing; and accountability) are analyzed from a triple perspective: What each requirement for trustworthy AI is, Why it is needed, and How each requirement can be implemented in practice. On the other hand, a practical approach to implement trustworthy AI systems allows defining the concept of responsibility of AI-based systems facing the law, through a given auditing process. Therefore, a responsible AI system is the resulting notion we introduce in this work, and a concept of utmost necessity that can be realized through auditing processes, subject to the challenges posed by the use of regulatory sandboxes. Our multidisciplinary vision of trustworthy AI culminates in a debate on the diverging views published lately about the future of AI. Our reflections in this matter conclude that regulation is a key for reaching a consensus among these views, and that trustworthy and responsible AI systems will be crucial for the present and future of our society.  ( 3 min )
    FIRE: An Optimization Approach for Fast Interpretable Rule Extraction. (arXiv:2306.07432v1 [cs.LG])
    We present FIRE, Fast Interpretable Rule Extraction, an optimization-based framework to extract a small but useful collection of decision rules from tree ensembles. FIRE selects sparse representative subsets of rules from tree ensembles, that are easy for a practitioner to examine. To further enhance the interpretability of the extracted model, FIRE encourages fusing rules during selection, so that many of the selected decision rules share common antecedents. The optimization framework utilizes a fusion regularization penalty to accomplish this, along with a non-convex sparsity-inducing penalty to aggressively select rules. Optimization problems in FIRE pose a challenge to off-the-shelf solvers due to problem scale and the non-convexity of the penalties. To address this, making use of problem-structure, we develop a specialized solver based on block coordinate descent principles; our solver performs up to 40x faster than existing solvers. We show in our experiments that FIRE outperforms state-of-the-art rule ensemble algorithms at building sparse rule sets, and can deliver more interpretable models compared to existing methods.  ( 2 min )
    Medical Data Augmentation via ChatGPT: A Case Study on Medication Identification and Medication Event Classification. (arXiv:2306.07297v1 [cs.CL])
    The identification of key factors such as medications, diseases, and relationships within electronic health records and clinical notes has a wide range of applications in the clinical field. In the N2C2 2022 competitions, various tasks were presented to promote the identification of key factors in electronic health records (EHRs) using the Contextualized Medication Event Dataset (CMED). Pretrained large language models (LLMs) demonstrated exceptional performance in these tasks. This study aims to explore the utilization of LLMs, specifically ChatGPT, for data augmentation to overcome the limited availability of annotated data for identifying the key factors in EHRs. Additionally, different pre-trained BERT models, initially trained on extensive datasets like Wikipedia and MIMIC, were employed to develop models for identifying these key variables in EHRs through fine-tuning on augmented datasets. The experimental results of two EHR analysis tasks, namely medication identification and medication event classification, indicate that data augmentation based on ChatGPT proves beneficial in improving performance for both medication identification and medication event classification.  ( 2 min )
    Self-Supervised Hyperspectral Inpainting with the Optimisation inspired Deep Neural Network Prior. (arXiv:2306.07308v1 [eess.IV])
    Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectral bands, conveying a wealth of spatial and spectral information. However, due to the instrumental errors and the atmospheric changes, the HSI obtained in practice are often contaminated by noise and dead pixels(lines), resulting in missing information that may severely compromise the subsequent applications. We introduce here a novel HSI missing pixel prediction algorithm, called Low Rank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnP is able to predict missing pixels and bands even when all spectral bands of the image are missing. The proposed LRS-PnP algorithm is further extended to a self-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP), called LRS-PnP-DIP. In a series of experiments with real data, It is shown that the LRS-PnP-DIP either achieves state-of-the-art inpainting performance compared to other learning-based methods, or outperforms them.  ( 2 min )
    A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks. (arXiv:2306.07303v1 [cs.LG])
    Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. Unlike conventional neural networks or updated versions of Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. As a result, transformer-based models have attracted substantial interest among researchers in the field of artificial intelligence. This can be attributed to their immense potential and remarkable achievements, not only in Natural Language Processing (NLP) tasks but also in a wide range of domains, including computer vision, audio and speech processing, healthcare, and the Internet of Things (IoT). Although several survey papers have been published highlighting the transformer's contributions in specific fields, architectural differences, or performance evaluations, there is still a significant absence of a comprehensive survey paper encompassing its major applications across various domains. Therefore, we undertook the task of filling this gap by conducting an extensive survey of proposed transformer models from 2017 to 2022. Our survey encompasses the identification of the top five application domains for transformer-based models, namely: NLP, Computer Vision, Multi-Modality, Audio and Speech Processing, and Signal Processing. We analyze the impact of highly influential transformer-based models in these domains and subsequently classify them based on their respective tasks using a proposed taxonomy. Our aim is to shed light on the existing potential and future possibilities of transformers for enthusiastic researchers, thus contributing to the broader understanding of this groundbreaking technology.  ( 3 min )
    Splitting and Parallelizing of Quantum Convolutional Neural Networks for Learning Translationally Symmetric Data. (arXiv:2306.07331v1 [quant-ph])
    A quantum convolutional neural network (QCNN) is a promising quantum machine learning (QML) model to achieve quantum advantages in classically intractable problems. However, QCNN requires a large number of measurements for data learning, limiting its practical applications for large-scale problems. To relieve this requirement, we propose a novel architecture called split-parallelizing QCNN (sp-QCNN), which exploits the prior knowledge of quantum data for designing efficient circuits. This architecture draws inspiration from geometric quantum machine learning and targets translationally symmetric quantum data commonly encountered in condensed matter physics. By splitting the quantum circuit based on translational symmetry, sp-QCNN substantially parallelizes conventional QCNN without increasing the number of qubits and further improves the measurement efficiency by an order of the number of qubits. To demonstrate its effectiveness, we apply sp-QCNN to a quantum phase recognition task and show that it can achieve similar performance to conventional QCNN while considerably reducing the measurement resources required. Due to its high measurement efficiency, sp-QCNN can mitigate statistical errors in estimating the gradient of the loss function, thereby accelerating the learning process. These results open up new possibilities for incorporating the prior knowledge of data into the efficient design of QML models, leading to practical quantum advantages.
    Composing Efficient, Robust Tests for Policy Selection. (arXiv:2306.07372v1 [cs.LG])
    Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.
    Expressivity Enhancement with Efficient Quadratic Neurons for Convolutional Neural Networks. (arXiv:2306.07294v1 [cs.LG])
    Convolutional neural networks (CNNs) have been successfully applied in a range of fields such as image classification and object segmentation. To improve their expressivity, various techniques, such as novel CNN architectures, have been explored. However, the performance gain from such techniques tends to diminish. To address this challenge, many researchers have shifted their focus to increasing the non-linearity of neurons, the fundamental building blocks of neural networks, to enhance the network expressivity. Nevertheless, most of these approaches incur a large number of parameters and thus formidable computation cost inevitably, impairing their efficiency to be deployed in practice. In this work, an efficient quadratic neuron structure is proposed to preserve the non-linearity with only negligible parameter and computation cost overhead. The proposed quadratic neuron can maximize the utilization of second-order computation information to improve the network performance. The experimental results have demonstrated that the proposed quadratic neuron can achieve a higher accuracy and a better computation efficiency in classification tasks compared with both linear neurons and non-linear neurons from previous works.
    Novel Regression and Least Square Support Vector Machine Learning Technique for Air Pollution Forecasting. (arXiv:2306.07301v1 [cs.LG])
    Air pollution is the origination of particulate matter, chemicals, or biological substances that brings pain to either humans or other living creatures or instigates discomfort to the natural habitat and the airspace. Hence, air pollution remains one of the paramount environmental issues as far as metropolitan cities are concerned. Several air pollution benchmarks are even said to have a negative influence on human health. Also, improper detection of air pollution benchmarks results in severe complications for humans and living creatures. To address this aspect, a novel technique called, Discretized Regression and Least Square Support Vector (DR-LSSV) based air pollution forecasting is proposed. The results indicate that the proposed DR-LSSV Technique can efficiently enhance air pollution forecasting performance and outperforms the conventional machine learning methods in terms of air pollution forecasting accuracy, air pollution forecasting time, and false positive rate.
    Knowledge Graph Contrastive Learning Based on Relation-Symmetrical Structure. (arXiv:2211.10738v4 [cs.AI] UPDATED)
    Knowledge graph embedding (KGE) aims at learning powerful representations to benefit various artificial intelligence applications. Meanwhile, contrastive learning has been widely leveraged in graph learning as an effective mechanism to enhance the discriminative capacity of the learned representations. However, the complex structures of KG make it hard to construct appropriate contrastive pairs. Only a few attempts have integrated contrastive learning strategies with KGE. But, most of them rely on language models ( e.g., Bert) for contrastive pair construction instead of fully mining information underlying the graph structure, hindering expressive ability. Surprisingly, we find that the entities within a relational symmetrical structure are usually similar and correlated. To this end, we propose a knowledge graph contrastive learning framework based on relation-symmetrical structure, KGE-SymCL, which mines symmetrical structure information in KGs to enhance the discriminative ability of KGE models. Concretely, a plug-and-play approach is proposed by taking entities in the relation-symmetrical positions as positive pairs. Besides, a self-supervised alignment loss is designed to pull together positive pairs. Experimental results on link prediction and entity classification datasets demonstrate that our KGE-SymCL can be easily adopted to various KGE models for performance improvements. Moreover, extensive experiments show that our model could outperform other state-of-the-art baselines.
    Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception. (arXiv:2306.06362v2 [cs.CV] UPDATED)
    We introduce the Aria Digital Twin (ADT) - an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. This ADT release contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes with 398 object instances (324 stationary and 74 dynamic). Each sequence consists of: a) raw data of two monochrome camera streams, one RGB camera stream, two IMU streams; b) complete sensor calibration; c) ground truth data including continuous 6-degree-of-freedom (6DoF) poses of the Aria devices, object 6DoF poses, 3D eye gaze vectors, 3D human poses, 2D image segmentations, image depth maps; and d) photo-realistic synthetic renderings. To the best of our knowledge, there is no existing egocentric dataset with a level of accuracy, photo-realism and comprehensiveness comparable to ADT. By contributing ADT to the research community, our mission is to set a new standard for evaluation in the egocentric machine perception domain, which includes very challenging research problems such as 3D object detection and tracking, scene reconstruction and understanding, sim-to-real learning, human pose prediction - while also inspiring new machine perception tasks for augmented reality (AR) applications. To kick start exploration of the ADT research use cases, we evaluated several existing state-of-the-art methods for object detection, segmentation and image translation tasks that demonstrate the usefulness of ADT as a benchmarking dataset.
    Additive Multi-Index Gaussian process modeling, with application to multi-physics surrogate modeling of the quark-gluon plasma. (arXiv:2306.07299v1 [nucl-th])
    The Quark-Gluon Plasma (QGP) is a unique phase of nuclear matter, theorized to have filled the Universe shortly after the Big Bang. A critical challenge in studying the QGP is that, to reconcile experimental observables with theoretical parameters, one requires many simulation runs of a complex physics model over a high-dimensional parameter space. Each run is computationally very expensive, requiring thousands of CPU hours, thus limiting physicists to only several hundred runs. Given limited training data for high-dimensional prediction, existing surrogate models often yield poor predictions with high predictive uncertainties, leading to imprecise scientific findings. To address this, we propose a new Additive Multi-Index Gaussian process (AdMIn-GP) model, which leverages a flexible additive structure on low-dimensional embeddings of the parameter space. This is guided by prior scientific knowledge that the QGP is dominated by multiple distinct physical phenomena (i.e., multiphysics), each involving a small number of latent parameters. The AdMIn-GP models for such embedded structures within a flexible Bayesian nonparametric framework, which facilitates efficient model fitting via a carefully constructed variational inference approach with inducing points. We show the effectiveness of the AdMIn-GP via a suite of numerical experiments and our QGP application, where we demonstrate considerably improved surrogate modeling performance over existing models.
    DreamDecompiler: Improved Bayesian Program Learning by Decompiling Amortised Knowledge. (arXiv:2306.07856v1 [cs.AI])
    Solving program induction problems requires searching through an enormous space of possibilities. DreamCoder is an inductive program synthesis system that, whilst solving problems, learns to simplify search in an iterative wake-sleep procedure. The cost of search is amortised by training a neural search policy, reducing search breadth and effectively "compiling" useful information to compose program solutions across tasks. Additionally, a library of program components is learnt to express discovered solutions in fewer components, reducing search depth. In DreamCoder, the neural search policy has only an indirect effect on the library learnt through the program solutions it helps discover. We present an approach for library learning that directly leverages the neural search policy, effectively "decompiling" its amortised knowledge to extract relevant program components. This provides stronger amortised inference: the amortised knowledge learnt to reduce search breadth is now also used to reduce search depth. We integrate our approach with DreamCoder and demonstrate faster domain proficiency with improved generalisation on a range of domains, particularly when fewer example solutions are available.
    Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments. (arXiv:2209.15090v3 [eess.SY] UPDATED)
    It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation costs. In this work, we leverage the notion of barrier function to explicitly encode the hard safety constraints, and given that the environment is unknown, relax them to our design of \emph{generative-model-based soft barrier functions}. Based on such soft barriers, we propose a safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
    Optimized Three Deep Learning Models Based-PSO Hyperparameters for Beijing PM2.5 Prediction. (arXiv:2306.07296v1 [cs.LG])
    Deep learning is a machine learning approach that produces excellent performance in various applications, including natural language processing, image identification, and forecasting. Deep learning network performance depends on the hyperparameter settings. This research attempts to optimize the deep learning architecture of Long short term memory (LSTM), Convolutional neural network (CNN), and Multilayer perceptron (MLP) for forecasting tasks using Particle swarm optimization (PSO), a swarm intelligence-based metaheuristic optimization methodology: Proposed M-1 (PSO-LSTM), M-2 (PSO-CNN), and M-3 (PSO-MLP). Beijing PM2.5 datasets was analyzed to measure the performance of the proposed models. PM2.5 as a target variable was affected by dew point, pressure, temperature, cumulated wind speed, hours of snow, and hours of rain. The deep learning network inputs consist of three different scenarios: daily, weekly, and monthly. The results show that the proposed M-1 with three hidden layers produces the best results of RMSE and MAPE compared to the proposed M-2, M-3, and all the baselines. A recommendation for air pollution management could be generated by using these optimized models
    Discovery of Optimal Quantum Error Correcting Codes via Reinforcement Learning. (arXiv:2305.06378v2 [quant-ph] UPDATED)
    The recently introduced Quantum Lego framework provides a powerful method for generating complex quantum error correcting codes (QECCs) out of simple ones. We gamify this process and unlock a new avenue for code design and discovery using reinforcement learning (RL). One benefit of RL is that we can specify \textit{arbitrary} properties of the code to be optimized. We train on two such properties, maximizing the code distance, and minimizing the probability of logical error under biased Pauli noise. For the first, we show that the trained agent identifies ways to increase code distance beyond naive concatenation, saturating the linear programming bound for CSS codes on 13 qubits. With a learning objective to minimize the logical error probability under biased Pauli noise, we find the best known CSS code at this task for $\lesssim 20$ qubits. Compared to other (locally deformed) CSS codes, including Surface, XZZX, and 2D Color codes, our $[[17,1,3]]$ code construction actually has \textit{lower} adversarial distance, yet better protects the logical information, highlighting the importance of QECC desiderata. Lastly, we comment on how this RL framework can be used in conjunction with physical quantum devices to tailor a code without explicit characterization of the noise model.
    An Ensemble Machine Learning Approach for Tropical Cyclone Detection Using ERA5 Reanalysis Data. (arXiv:2306.07291v1 [physics.ao-ph])
    Tropical Cyclones (TCs) are counted among the most destructive phenomena that can be found in nature. Every year, globally an average of 90 TCs occur over tropical waters, and global warming is making them stronger, larger and more destructive. The accurate detection and tracking of such phenomena have become a relevant and interesting area of research in weather and climate science. Traditionally, TCs have been identified in large climate datasets through the use of deterministic tracking schemes that rely on subjective thresholds. Machine Learning (ML) models can complement deterministic approaches due to their ability to capture the mapping between the input climatic drivers and the geographical position of the TC center from the available data. This study presents a ML ensemble approach for locating TC center coordinates, embedding both TC classification and localization in a single end-to-end learning task. The ensemble combines TC center estimates of different ML models that agree about the presence of a TC in input data. ERA5 reanalysis were used for model training and testing jointly with the International Best Track Archive for Climate Stewardship records. Results showed that the ML approach is well-suited for TC detection providing good generalization capabilities on out of sample data. In particular, it was able to accurately detect lower TC categories than those used for training the models. On top of this, the ensemble approach was able to further improve TC localization performance with respect to single model TC center estimates, demonstrating the good capabilities of the proposed approach.
    Multi-Platform Budget Management in Ad Markets with Non-IC Auctions. (arXiv:2306.07352v1 [cs.GT])
    In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has $O(T^{3/4})$ regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.
    A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation. (arXiv:2306.07304v1 [cs.LG])
    In recent years, concept-based approaches have emerged as some of the most promising explainability methods to help us interpret the decisions of Artificial Neural Networks (ANNs). These methods seek to discover intelligible visual 'concepts' buried within the complex patterns of ANN activations in two key steps: (1) concept extraction followed by (2) importance estimation. While these two steps are shared across methods, they all differ in their specific implementations. Here, we introduce a unifying theoretical framework that comprehensively defines and clarifies these two steps. This framework offers several advantages as it allows us: (i) to propose new evaluation metrics for comparing different concept extraction approaches; (ii) to leverage modern attribution methods and evaluation metrics to extend and systematically evaluate state-of-the-art concept-based approaches and importance estimation techniques; (iii) to derive theoretical guarantees regarding the optimality of such methods. We further leverage our framework to try to tackle a crucial question in explainability: how to efficiently identify clusters of data points that are classified based on a similar shared strategy. To illustrate these findings and to highlight the main strategies of a model, we introduce a visual representation called the strategic cluster graph. Finally, we present https://serre-lab.github.io/Lens, a dedicated website that offers a complete compilation of these visualizations for all classes of the ImageNet dataset.
    Decoding Brain Motor Imagery with various Machine Learning techniques. (arXiv:2306.07519v1 [cs.HC])
    Motor imagery (MI) is a well-documented technique used by subjects in BCI (Brain Computer Interface) experiments to modulate brain activity within the motor cortex and surrounding areas of the brain. In our term project, we conducted an experiment in which the subjects were instructed to perform motor imagery that would be divided into two classes (Right and Left). Experiments were conducted with two different types of electrodes (Gel and POLiTag) and data for individual subjects was collected. In this paper, we will apply different machine learning (ML) methods to create a decoder based on offline training data that uses evidence accumulation to predict a subject's intent from their modulated brain signals in real-time.
    Homophily modulates double descent generalization in graph convolution networks. (arXiv:2212.13069v2 [cs.LG] UPDATED)
    Graph neural networks are among the most successful machine learning models for relational datasets like metabolic, transportation, and social networks. Yet the determinants of their strong generalization for diverse interactions encoded in the data are not well understood. Methods from statistical learning theory do not explain emergent phenomena such as double descent or the dependence of risk on the nature of interactions. We use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. The derived curves are phenomenologically rich: they explain the distinction between learning on homophilic and heterophilic and they predict double descent whose existence in GNNs has been questioned by recent work. We show how risk depends on the interplay between the noise in the graph, noise in the features, and the proportion of nodes used for training. Our analysis predicts qualitative behavior not only of a stylized graph learning model but also to complex GNNs on messy real-world datasets. As a case in point, we use these analytic insights about heterophily and self-loop signs to improve performance of state-of-the-art graph convolution networks on several heterophilic benchmarks by a simple addition of negative self-loop filters.
    Speech Enhancement and Dereverberation with Diffusion-based Generative Models. (arXiv:2208.05830v2 [eess.AS] UPDATED)
    In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve into an extensive theoretical examination of its implications. Opposed to usual conditional generation tasks, we do not start the reverse process from pure Gaussian noise but from a mixture of noisy speech and Gaussian noise. This matches our forward process which moves from clean speech to noisy speech by including a drift term. We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates. By adapting the network architecture, we are able to significantly improve the speech enhancement performance, indicating that the network, rather than the formalism, was the main limitation of our original approach. In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models and achieves better generalization when evaluating on a different corpus than used for training. We complement the results with an instrumental evaluation using real-world noisy recordings and a listening experiment, in which our proposed method is rated best. Examining different sampler configurations for solving the reverse process allows us to balance the performance and computational speed of the proposed method. Moreover, we show that the proposed method is also suitable for dereverberation and thus not limited to additive background noise removal. Code and audio examples are available online, see https://github.com/sp-uhh/sgmse
    MOFI: Learning Image Representations from Noisy Entity Annotated Images. (arXiv:2306.07952v1 [cs.CV])
    We present MOFI, a new vision foundation model designed to learn image representations from noisy entity annotated images. MOFI differs from previous work in two key aspects: ($i$) pre-training data, and ($ii$) training recipe. Regarding data, we introduce a new approach to automatically assign entity labels to images from noisy image-text pairs. Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image. The approach is simple, does not require costly human annotation, and can be readily scaled up to billions of image-text pairs mined from the web. Through this method, we have created Image-to-Entities (I2E), a new large-scale dataset with 1 billion images and 2 million distinct entities, covering rich visual concepts in the wild. Building upon the I2E dataset, we study different training recipes, including supervised pre-training, contrastive pre-training, and multi-task learning. For constrastive pre-training, we treat entity names as free-form text, and further enrich them with entity descriptions. Experiments show that supervised pre-training with large-scale fine-grained entity labels is highly effective for image retrieval tasks, and multi-task training further improves the performance. The final MOFI model achieves 86.66% mAP on the challenging GPR1200 dataset, surpassing the previous state-of-the-art performance of 72.19% from OpenAI's CLIP model. Further experiments on zero-shot and linear probe image classification also show that MOFI outperforms a CLIP model trained on the original image-text data, demonstrating the effectiveness of the I2E dataset in learning strong image representations.
    User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems. (arXiv:2306.07526v1 [cs.LG])
    Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution.
    Artificial Benchmark for Community Detection with Outliers (ABCD+o). (arXiv:2301.05749v2 [cs.SI] UPDATED)
    The Artificial Benchmark for Community Detection graph (ABCD) is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs with similar properties as the well-known LFR one, and its main parameter $\xi$ can be tuned to mimic its counterpart in the LFR model, the mixing parameter $\mu$. In this paper, we extend the ABCD model to include potential outliers. We perform some exploratory experiments on both the new ABCD+o model as well as a real-world network to show that outliers possess some desired, distinguishable properties.
    How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition. (arXiv:2207.07730v2 [cs.LG] UPDATED)
    A major goal of artificial intelligence (AI) is to create an agent capable of acquiring a general understanding of the world. Such an agent would require the ability to continually accumulate and build upon its knowledge as it encounters new experiences. Lifelong or continual learning addresses this setting, whereby an agent faces a continual stream of problems and must strive to capture the knowledge necessary for solving each new task it encounters. If the agent is capable of accumulating knowledge in some form of compositional representation, it could then selectively reuse and combine relevant pieces of knowledge to construct novel solutions. Despite the intuitive appeal of this simple idea, the literatures on lifelong learning and compositional learning have proceeded largely separately. In an effort to promote developments that bridge between the two fields, this article surveys their respective research landscapes and discusses existing and future connections between them.
    Identification of Nonlinear Latent Hierarchical Models. (arXiv:2306.07916v1 [cs.LG])
    Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of both causal structure and latent variables can be achieved under mild assumptions: on causal structures, we allow for the existence of multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we do not make parametric assumptions, thus permitting general nonlinearity and multi-dimensional continuous variables. Specifically, we first develop a basic identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models.
    Multi-Fidelity Multi-Armed Bandits Revisited. (arXiv:2306.07761v1 [cs.LG])
    We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound $\Omega(K^{1/3}\Lambda^{2/3})$ and problem-dependent lower bound $\Omega(K\log \Lambda)$, where $K$ is the number of arms and $\Lambda$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $\Lambda$.
    PaVa: a novel Path-based Valley-seeking clustering algorithm. (arXiv:2306.07503v1 [cs.LG])
    Clustering methods are being applied to a wider range of scenarios involving more complex datasets, where the shapes of clusters tend to be arbitrary. In this paper, we propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. This work aims to seek the valleys among clusters and then individually extract clusters. Three vital techniques are used in this algorithm. First, path distance (minmax distance) is employed to transform the irregular boundaries among clusters, that is density valleys, into perfect spherical shells. Second, a suitable density measurement, $k$-distance, is employed to make adjustment on Minimum Spanning Tree, by which a robust minmax distance is calculated. Third, we seek the transformed density valleys by determining their centers and radius. First, the clusters are wrapped in spherical shells after the distance transformation, making the extraction process efficient even with clusters of arbitrary shape. Second, adjusted Minimum Spanning Tree enhances the robustness of minmax distance under different kinds of noise. Last, the number of clusters does not need to be inputted or decided manually due to the individual extraction process. After applying the proposed algorithm to several commonly used synthetic datasets, the results indicate that the Path-based Valley-seeking algorithm is accurate and efficient. The algorithm is based on the dissimilarity of objects, so it can be applied to a wide range of fields. Its performance on real-world datasets illustrates its versatility.
    Temporal Gradient Inversion Attacks with Robust Optimization. (arXiv:2306.07883v1 [cs.LG])
    Federated Learning (FL) has emerged as a promising approach for collaborative model training without sharing private data. However, privacy concerns regarding information exchanged during FL have received significant research attention. Gradient Inversion Attacks (GIAs) have been proposed to reconstruct the private data retained by local clients from the exchanged gradients. While recovering private data, the data dimensions and the model complexity increase, which thwart data reconstruction by GIAs. Existing methods adopt prior knowledge about private data to overcome those challenges. In this paper, we first observe that GIAs with gradients from a single iteration fail to reconstruct private data due to insufficient dimensions of leaked gradients, complex model architectures, and invalid gradient information. We investigate a Temporal Gradient Inversion Attack with a Robust Optimization framework, called TGIAs-RO, which recovers private data without any prior knowledge by leveraging multiple temporal gradients. To eliminate the negative impacts of outliers, e.g., invalid gradients for collaborative optimization, robust statistics are proposed. Theoretical guarantees on the recovery performance and robustness of TGIAs-RO against invalid gradients are also provided. Extensive empirical results on MNIST, CIFAR10, ImageNet and Reuters 21578 datasets show that the proposed TGIAs-RO with 10 temporal gradients improves reconstruction performance compared to state-of-the-art methods, even for large batch sizes (up to 128), complex models like ResNet18, and large datasets like ImageNet (224*224 pixels). Furthermore, the proposed attack method inspires further exploration of privacy-preserving methods in the context of FL.
    Differentially Private One Permutation Hashing and Bin-wise Consistent Weighted Sampling. (arXiv:2306.07674v1 [stat.ML])
    Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text representations so that practitioners do not have to materialize the original data (which would be prohibitive). Another popular use of MinHash is for building hash tables to enable sub-linear time approximate near neighbor (ANN) search. MinHash has also been used as a tool for building large-scale machine learning systems. The standard implementation of MinHash requires applying $K$ random permutations. In comparison, the method of one permutation hashing (OPH), is an efficient alternative of MinHash which splits the data vectors into $K$ bins and generates hash values within each bin. OPH is substantially more efficient and also more convenient to use. In this paper, we combine the differential privacy (DP) with OPH (as well as MinHash), to propose the DP-OPH framework with three variants: DP-OPH-fix, DP-OPH-re and DP-OPH-rand, depending on which densification strategy is adopted to deal with empty bins in OPH. A detailed roadmap to the algorithm design is presented along with the privacy analysis. An analytical comparison of our proposed DP-OPH methods with the DP minwise hashing (DP-MH) is provided to justify the advantage of DP-OPH. Experiments on similarity search confirm the merits of DP-OPH, and guide the choice of the proper variant in different practical scenarios. Our technique is also extended to bin-wise consistent weighted sampling (BCWS) to develop a new DP algorithm called DP-BCWS for non-binary data. Experiments on classification tasks demonstrate that DP-BCWS is able to achieve excellent utility at around $\epsilon = 5\sim 10$, where $\epsilon$ is the standard parameter in the language of $(\epsilon, \delta)$-DP.
    Learning distinct features helps, provably. (arXiv:2106.06012v3 [cs.LG] UPDATED)
    We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L_2$-distance between the hidden-layer features and theoretically investigate how learning non-redundant distinct features affects the performance of the network. To do so, we derive novel generalization bounds depending on feature diversity based on Rademacher complexity for such networks. Our analysis proves that more distinct features at the network's units within the hidden layer lead to better generalization. We also show how to extend our results to deeper networks and different losses.
    Model selection of polynomial kernel regression. (arXiv:1503.02143v2 [cs.LG] UPDATED)
    Polynomial kernel regression is one of the standard and state-of-the-art learning strategies. However, as is well known, the choices of the degree of polynomial kernel and the regularization parameter are still open in the realm of model selection. The first aim of this paper is to develop a strategy to select these parameters. On one hand, based on the worst-case learning rate analysis, we show that the regularization term in polynomial kernel regression is not necessary. In other words, the regularization parameter can decrease arbitrarily fast when the degree of the polynomial kernel is suitable tuned. On the other hand,taking account of the implementation of the algorithm, the regularization term is required. Summarily, the effect of the regularization term in polynomial kernel regression is only to circumvent the " ill-condition" of the kernel matrix. Based on this, the second purpose of this paper is to propose a new model selection strategy, and then design an efficient learning algorithm. Both theoretical and experimental analysis show that the new strategy outperforms the previous one. Theoretically, we prove that the new learning strategy is almost optimal if the regression function is smooth. Experimentally, it is shown that the new strategy can significantly reduce the computational burden without loss of generalization capability.
    Contrastive Corpus Attribution for Explaining Representations. (arXiv:2210.00107v2 [cs.LG] UPDATED)
    Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).
    VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON. (arXiv:2306.07890v1 [cs.CV])
    Despite progress in vision-based inspection algorithms, real-world industrial challenges -- specifically in data availability, quality, and complex production requirements -- often remain under-addressed. We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges. Unlike previous datasets, VISION brings versatility to defect detection, offering annotation masks across all splits and catering to various detection methodologies. Our datasets also feature instance-segmentation annotation, enabling precise defect identification. With a total of 18k images encompassing 44 defect types, VISION strives to mirror a wide range of real-world production scenarios. By supporting two ongoing challenge competitions on the VISION Datasets, we hope to foster further advancements in vision-based industrial inspection.
    Exact Mean Square Linear Stability Analysis for SGD. (arXiv:2306.07850v1 [cs.LG])
    The dynamical stability of optimization methods at the vicinity of minima of the loss has recently attracted significant attention. For gradient descent (GD), stable convergence is possible only to minima that are sufficiently flat w.r.t. the step size, and those have been linked with favorable properties of the trained model. However, while the stability threshold of GD is well-known, to date, no explicit expression has been derived for the exact threshold of stochastic GD (SGD). In this paper, we derive such a closed-form expression. Specifically, we provide an explicit condition on the step size $\eta$ that is both necessary and sufficient for the stability of SGD in the mean square sense. Our analysis sheds light on the precise role of the batch size $B$. Particularly, we show that the stability threshold is a monotonically non-decreasing function of the batch size, which means that reducing the batch size can only hurt stability. Furthermore, we show that SGD's stability threshold is equivalent to that of a process which takes in each iteration a full batch gradient step w.p. $1-p$, and a single sample gradient step w.p. $p$, where $p \approx 1/B $. This indicates that even with moderate batch sizes, SGD's stability threshold is very close to that of GD's. Finally, we prove simple necessary conditions for stability, which depend on the batch size, and are easier to compute than the precise threshold. We demonstrate our theoretical findings through experiments on the MNIST dataset.
    Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition. (arXiv:2306.07949v1 [eess.AS])
    End-to-end (E2E) systems have shown comparable performance to hybrid systems for automatic speech recognition (ASR). Word timings, as a by-product of ASR, are essential in many applications, especially for subtitling and computer-aided pronunciation training. In this paper, we improve the frame-level classifier for word timings in E2E system by introducing label priors in connectionist temporal classification (CTC) loss, which is adopted from prior works, and combining low-level Mel-scale filter banks with high-level ASR encoder output as input feature. On the internal Chinese corpus, the proposed method achieves 95.68%/94.18% compared to the hybrid system 93.0%/90.22% on the word timing accuracy metrics. It also surpass a previous E2E approach with an absolute increase of 4.80%/8.02% on the metrics on 7 languages. In addition, we further improve word timing accuracy by delaying CTC peaks with frame-wise knowledge distillation, though only experimenting on LibriSpeech.  ( 2 min )
    A New Probabilistic Distance Metric With Application In Gaussian Mixture Reduction. (arXiv:2306.07309v1 [cs.LG])
    This paper presents a new distance metric to compare two continuous probability density functions. The main advantage of this metric is that, unlike other statistical measurements, it can provide an analytic, closed-form expression for a mixture of Gaussian distributions while satisfying all metric properties. These characteristics enable fast, stable, and efficient calculations, which are highly desirable in real-world signal processing applications. The application in mind is Gaussian Mixture Reduction (GMR), which is widely used in density estimation, recursive tracking, and belief propagation. To address this problem, we developed a novel algorithm dubbed the Optimization-based Greedy GMR (OGGMR), which employs our metric as a criterion to approximate a high-order Gaussian mixture with a lower order. Experimental results show that the OGGMR algorithm is significantly faster and more efficient than state-of-the-art GMR algorithms while retaining the geometric shape of the original mixture.  ( 2 min )
    Hyperbolic Graph Diffusion Model for Molecule Generation. (arXiv:2306.07618v1 [cs.LG])
    Recently, diffusion models have achieved remarkable performance in data generation, e.g., generating high-quality images. Nevertheless, chemistry molecules often have complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most existing diffusion models highly rely on computing the probability distribution, i.e., Gaussian distribution, in Euclidean space, which cannot capture internal non-Euclidean structures of molecules, especially the hierarchical structures of the implicit manifold surface represented by molecules. It has been observed that the complex hierarchical structures in hyperbolic embedding space become more prominent and easier to be captured. In order to leverage both the data generation power of diffusion models and the strong capability to extract complex geometric features of hyperbolic embedding, we propose to extend the diffusion model to hyperbolic manifolds for molecule generation, namely, Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolic variational autoencoder to generate the hyperbolic hidden representation of nodes and then a score-based hyperbolic graph neural network is used to learn the distribution in hyperbolic space. Numerical experimental results show that the proposed HGDM achieves higher performance on several molecular datasets, compared with state-of-the-art methods.
    Theoretical Foundations of Adversarially Robust Learning. (arXiv:2306.07723v1 [cs.LG])
    Despite extraordinary progress, current machine learning systems have been shown to be brittle against adversarial examples: seemingly innocuous but carefully crafted perturbations of test examples that cause machine learning predictors to misclassify. Can we learn predictors robust to adversarial examples? and how? There has been much empirical interest in this contemporary challenge in machine learning, and in this thesis, we address it from a theoretical perspective. In this thesis, we explore what robustness properties can we hope to guarantee against adversarial examples and develop an understanding of how to algorithmically guarantee them. We illustrate the need to go beyond traditional approaches and principles such as empirical risk minimization and uniform convergence, and make contributions that can be categorized as follows: (1) introducing problem formulations capturing aspects of emerging practical challenges in robust learning, (2) designing new learning algorithms with provable robustness guarantees, and (3) characterizing the complexity of robust learning and fundamental limitations on the performance of any algorithm.
    Multi-objective Molecular Optimization for Opioid Use Disorder Treatment Using Generative Network Complex. (arXiv:2306.07484v1 [cs.LG])
    Opioid Use Disorder (OUD) has emerged as a significant global public health issue, with complex multifaceted conditions. Due to the lack of effective treatment options for various conditions, there is a pressing need for the discovery of new medications. In this study, we propose a deep generative model that combines a stochastic differential equation (SDE)-based diffusion modeling with the latent space of a pretrained autoencoder model. The molecular generator enables efficient generation of molecules that are effective on multiple targets, specifically the mu, kappa, and delta opioid receptors. Furthermore, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of the generated molecules to identify drug-like compounds. To enhance the pharmacokinetic properties of some lead compounds, we employ a molecular optimization approach. We obtain a diverse set of drug-like molecules. We construct binding affinity predictors by integrating molecular fingerprints derived from autoencoder embeddings, transformer embeddings, and topological Laplacians with advanced machine learning algorithms. Further experimental studies are needed to evaluate the pharmacological effects of these drug-like compounds for OUD treatment. Our machine learning platform serves as a valuable tool in designing and optimizing effective molecules for addressing OUD.  ( 2 min )
    G-invariant diffusion maps. (arXiv:2306.07350v1 [cs.LG])
    The diffusion maps embedding of data lying on a manifold have shown success in tasks ranging from dimensionality reduction and clustering, to data visualization. In this work, we consider embedding data sets which were sampled from a manifold which is closed under the action of a continuous matrix group. An example of such a data set are images who's planar rotations are arbitrary. The G-invariant graph Laplacian, introduced in a previous work of the authors, admits eigenfunctions in the form of tensor products between the elements of the irreducible unitary representations of the group and eigenvectors of certain matrices. We employ these eigenfunctions to derive diffusion maps that intrinsically account for the group action on the data. In particular, we construct both equivariant and invariant embeddings which can be used naturally to cluster and align the data points. We demonstrate the effectiveness of our construction with simulated data.
    Making forecasting self-learning and adaptive -- Pilot forecasting rack. (arXiv:2306.07305v1 [cs.LG])
    Retail sales and price projections are typically based on time series forecasting. For some product categories, the accuracy of demand forecasts achieved is low, negatively impacting inventory, transport, and replenishment planning. This paper presents our findings based on a proactive pilot exercise to explore ways to help retailers to improve forecast accuracy for such product categories. We evaluated opportunities for algorithmic interventions to improve forecast accuracy based on a sample product category, Knitwear. The Knitwear product category has a current demand forecast accuracy from non-AI models in the range of 60%. We explored how to improve the forecast accuracy using a rack approach. To generate forecasts, our decision model dynamically selects the best algorithm from an algorithm rack based on performance for a given state and context. Outcomes from our AI/ML forecasting model built using advanced feature engineering show an increase in the accuracy of demand forecast for Knitwear product category by 20%, taking the overall accuracy to 80%. Because our rack comprises algorithms that cater to a range of customer data sets, the forecasting model can be easily tailored for specific customer contexts.
    Progressive Class-Wise Attention (PCA) Approach for Diagnosing Skin Lesions. (arXiv:2306.07300v1 [cs.LG])
    Skin cancer holds the highest incidence rate among all cancers globally. The importance of early detection cannot be overstated, as late-stage cases can be lethal. Classifying skin lesions, however, presents several challenges due to the many variations they can exhibit, such as differences in colour, shape, and size, significant variation within the same class, and notable similarities between different classes. This paper introduces a novel class-wise attention technique that equally regards each class while unearthing more specific details about skin lesions. This attention mechanism is progressively used to amalgamate discriminative feature details from multiple scales. The introduced technique demonstrated impressive performance, surpassing more than 15 cutting-edge methods including the winners of HAM1000 and ISIC 2019 leaderboards. It achieved an impressive accuracy rate of 97.40% on the HAM10000 dataset and 94.9% on the ISIC 2019 dataset.
  • Open

    Offline Policy Evaluation and Optimization under Confounding. (arXiv:2211.16583v3 [stat.ML] UPDATED)
    Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but can also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on their time-evolution and effect on the data-collection policies. We determine when consistent value estimates are not achievable, providing and discussing algorithms to estimate lower bounds with guarantees in those cases. When consistent estimates are achievable, we provide sample complexity guarantees. We also present new algorithms for offline policy improvement and prove local convergence guarantees. Finally, we experimentally evaluate our algorithms on gridworld and a simulated healthcare setting of managing sepsis patients. We note that in gridworld, our model-based method provides tighter lower bounds than existing methods, while in the sepsis simulator, our methods significantly outperform confounder-oblivious benchmarks.  ( 2 min )
    Kernelized Reinforcement Learning with Order Optimal Regret Bounds. (arXiv:2306.07745v1 [cs.LG])
    Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known.  ( 2 min )
    Additive Causal Bandits with Unknown Graph. (arXiv:2306.07858v1 [cs.LG])
    We explore algorithms to select actions in the causal bandit setting where the learner can choose to intervene on a set of random variables related by a causal graph, and the learner sequentially chooses interventions and observes a sample from the interventional distribution. The learner's goal is to quickly find the intervention, among all interventions on observable variables, that maximizes the expectation of an outcome variable. We depart from previous literature by assuming no knowledge of the causal graph except that latent confounders between the outcome and its ancestors are not present. We first show that the unknown graph problem can be exponentially hard in the parents of the outcome. To remedy this, we adopt an additional additive assumption on the outcome which allows us to solve the problem by casting it as an additive combinatorial linear bandit problem with full-bandit feedback. We propose a novel action-elimination algorithm for this setting, show how to apply this algorithm to the causal bandit problem, provide sample complexity bounds, and empirically validate our findings on a suite of randomly generated causal models, effectively showing that one does not need to explicitly learn the parents of the outcome to identify the best intervention.
    Exact Solutions of a Deep Linear Network. (arXiv:2202.04777v7 [stat.ML] UPDATED)
    This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.
    Nonparametric extensions of randomized response for private confidence sets. (arXiv:2202.08728v3 [stat.ME] UPDATED)
    This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP). Given bounded observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we present confidence intervals (CI) and time-uniform confidence sequences (CS) for $\mu^\star$ when only given access to the privatized data. To achieve this, we introduce a nonparametric and sequentially interactive generalization of Warner's famous ``randomized response'' mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. For example, our results yield private analogues of Hoeffding's inequality in both fixed-time and time-uniform regimes. We extend these Hoeffding-type CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.
    Stochastic coordinate transformations with applications to robust machine learning. (arXiv:2110.01729v3 [stat.ML] UPDATED)
    In this paper we introduce a set of novel features for identifying underlying stochastic behavior of input data using the Karhunen-Loeve expansion. These novel features are constructed by applying a coordinate transformation based on the recent Functional Data Analysis theory for anomaly detection. The associated signal decomposition is an exact hierarchical tensor product expansion with known optimality properties for approximating stochastic processes (random fields) with finite dimensional function spaces. In principle these low dimensional spaces can capture most of the stochastic behavior of `underlying signals' in a given nominal class, and can reject signals in alternative classes as stochastic anomalies. Using a hierarchical finite dimensional expansion of the nominal class, a series of orthogonal nested subspaces is constructed for detecting anomalous signal components. Projection coefficients of input data in these subspaces are then used to train a Machine Learning (ML) classifier. However, due to the split of the signal into nominal and anomalous projection components, clearer separation surfaces of the classes arise. In fact we show that with a sufficiently accurate estimation of the covariance structure of the nominal class, a sharp classification can be obtained. This is particularly advantageous for situations with large unbalanced datasets. We formulate this concept and demonstrate it on a number of high-dimensional datasets. This approach yields significant increases in accuracy over ML methods that use the original feature data. Our tests on the Alzheimer's Disease ADNI dataset shows a dramatic increase in accuracy (from 48% to 89% accuracy). Furthermore, tests from unbalanced semi-synthetic datasets created from the GCM data confirmed increased accuracy as the dataset becomes more unbalanced.
    Factor-augmented tree ensembles. (arXiv:2111.14000v6 [stat.ML] UPDATED)
    This manuscript proposes to extend the information set of time-series regression trees with latent stationary factors extracted via state-space methods. In doing so, this approach generalises time-series regression trees on two dimensions. First, it allows to handle predictors that exhibit measurement error, non-stationary trends, seasonality and/or irregularities such as missing observations. Second, it gives a transparent way for using domain-specific theory to inform time-series regression trees. Empirically, ensembles of these factor-augmented trees provide a reliable approach for macro-finance problems. This article highlights it focussing on the lead-lag effect between equity volatility and the business cycle in the United States.  ( 2 min )
    Theoretical Foundations of Adversarially Robust Learning. (arXiv:2306.07723v1 [cs.LG])
    Despite extraordinary progress, current machine learning systems have been shown to be brittle against adversarial examples: seemingly innocuous but carefully crafted perturbations of test examples that cause machine learning predictors to misclassify. Can we learn predictors robust to adversarial examples? and how? There has been much empirical interest in this contemporary challenge in machine learning, and in this thesis, we address it from a theoretical perspective. In this thesis, we explore what robustness properties can we hope to guarantee against adversarial examples and develop an understanding of how to algorithmically guarantee them. We illustrate the need to go beyond traditional approaches and principles such as empirical risk minimization and uniform convergence, and make contributions that can be categorized as follows: (1) introducing problem formulations capturing aspects of emerging practical challenges in robust learning, (2) designing new learning algorithms with provable robustness guarantees, and (3) characterizing the complexity of robust learning and fundamental limitations on the performance of any algorithm.  ( 2 min )
    Robustly Learning a Single Neuron via Sharpness. (arXiv:2306.07892v1 [cs.LG])
    We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including ReLUs, approximates the optimal $L_2^2$-error within a constant factor. Our algorithm applies under much milder distributional assumptions compared to prior work. The key ingredient enabling our results is a novel connection to local error bounds from optimization theory.  ( 2 min )
    Distribution Free Prediction Sets for Node Classification. (arXiv:2211.14555v2 [stat.ML] UPDATED)
    Graph Neural Networks (GNNs) are able to achieve high classification accuracy on many important real world datasets, but provide no rigorous notion of predictive uncertainty. Quantifying the confidence of GNN models is difficult due to the dependence between datapoints induced by the graph structure. We leverage recent advances in conformal prediction to construct prediction sets for node classification in inductive learning scenarios. We do this by taking an existing approach for conformal classification that relies on \textit{exchangeable} data and modifying it by appropriately weighting the conformal scores to reflect the network structure. We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.  ( 2 min )
    Differential Privacy with Random Projections and Sign Random Projections. (arXiv:2306.01751v2 [cs.CR] UPDATED)
    In this paper, we develop a series of differential privacy (DP) algorithms from a family of random projections (RP) for general applications in machine learning, data mining, and information retrieval. Among the presented algorithms, iDP-SignRP is remarkably effective under the setting of ``individual differential privacy'' (iDP), based on sign random projections (SignRP). Also, DP-SignOPORP considerably improves existing algorithms in the literature under the standard DP setting, using ``one permutation + one random projection'' (OPORP), where OPORP is a variant of the celebrated count-sketch method with fixed-length binning and normalization. Without taking signs, among the DP-RP family, DP-OPORP achieves the best performance. Our key idea for improving DP-RP is to take only the signs, i.e., $sign(x_j) = sign\left(\sum_{i=1}^p u_i w_{ij}\right)$, of the projected data. The intuition is that the signs often remain unchanged when the original data ($u$) exhibit small changes (according to the ``neighbor'' definition in DP). In other words, the aggregation and quantization operations themselves provide good privacy protections. We develop a technique called ``smooth flipping probability'' that incorporates this intuitive privacy benefit of SignRPs and improves the standard DP bit flipping strategy. Based on this technique, we propose DP-SignOPORP which satisfies strict DP and outperforms other DP variants based on SignRP (and RP), especially when $\epsilon$ is not very large (e.g., $\epsilon = 5\sim10$). Moreover, if an application scenario accepts individual DP, then we immediately obtain an algorithm named iDP-SignRP which achieves excellent utilities even at small~$\epsilon$ (e.g., $\epsilon<0.5$).
    Concentration Bounds for Discrete Distribution Estimation in KL Divergence. (arXiv:2302.06869v2 [stat.ML] UPDATED)
    We study the problem of discrete distribution estimation in KL divergence and provide concentration bounds for the Laplace estimator. We show that the deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$, improving upon the best prior result of $k/n$. We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors.
    How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control. (arXiv:2302.03791v2 [stat.ML] UPDATED)
    Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions, important questions remain on the reliability and trustworthiness of these sampling procedures for their responsible use in critical scenarios. Conformal prediction is a modern tool to construct finite-sample, distribution-free uncertainty guarantees for any black-box predictor. In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term $K$-RCPS, which allows to $(i)$ provide entrywise calibrated intervals for future samples of any diffusion model, and $(ii)$ control a certain notion of risk with respect to a ground truth image with minimal mean interval length. Differently from existing conformal risk control procedures, ours relies on a novel convex optimization approach that allows for multidimensional risk control while provably minimizing the mean interval length. We illustrate our approach on two real-world image denoising problems: on natural images of faces as well as on computed tomography (CT) scans of the abdomen, demonstrating state of the art performance.  ( 2 min )
    Partial Identification of Dose Responses with Hidden Confounders. (arXiv:2204.11206v3 [stat.ME] UPDATED)
    Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables -- causal parents of both the treatment and the outcome -- are included as covariates. Unfortunately, given observational data alone, we cannot know with certainty that this criterion is satisfied. Sensitivity analyses provide principled ways to give bounds on causal estimates when confounding variables are hidden. While much attention is focused on sensitivity analyses for discrete-valued treatments, much less is paid to continuous-valued treatments. We present novel methodology to bound both average and conditional average continuous-valued treatment-effect estimates when they cannot be point identified due to hidden confounding. A semi-synthetic benchmark on multiple datasets shows our method giving tighter coverage of the true dose-response curve than a recently proposed continuous sensitivity model and baselines. Finally, we apply our method to a real-world observational case study to demonstrate the value of identifying dose-dependent causal effects.
    Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection. (arXiv:2210.09186v6 [cs.SI] UPDATED)
    The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. Here we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm, and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks, and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement.
    Differentiating Metropolis-Hastings to Optimize Intractable Densities. (arXiv:2306.07961v1 [stat.ML])
    When performing inference on probabilistic models, target densities often become intractable, necessitating the use of Monte Carlo samplers. We develop a methodology for unbiased differentiation of the Metropolis-Hastings sampler, allowing us to differentiate through probabilistic inference. By fusing recent advances in stochastic differentiation with Markov chain coupling schemes, the procedure can be made unbiased, low-variance, and automatic. This allows us to apply gradient-based optimization to objectives expressed as expectations over intractable target densities. We demonstrate our approach by finding an ambiguous observation in a Gaussian mixture model and by maximizing the specific heat in an Ising model.
    Multi-Fidelity Multi-Armed Bandits Revisited. (arXiv:2306.07761v1 [cs.LG])
    We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound $\Omega(K^{1/3}\Lambda^{2/3})$ and problem-dependent lower bound $\Omega(K\log \Lambda)$, where $K$ is the number of arms and $\Lambda$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $\Lambda$.
    Kernel Random Projection Depth for Outlier Detection. (arXiv:2306.07056v2 [stat.ML] UPDATED)
    This paper proposes an extension of Random Projection Depth (RPD) to cope with multiple modalities and non-convexity on data clouds. In the framework of the proposed method, the RPD is computed in a reproducing kernel Hilbert space. With the help of kernel principal component analysis, we expect that the proposed method can cope with the above multiple modalities and non-convexity. The experimental results demonstrate that the proposed method outperforms RPD and is comparable to other existing detection models on benchmark datasets regarding Area Under the Curves (AUCs) of Receiver Operating Characteristic (ROC).
    Fixed points of arbitrarily deep 1-dimensional neural networks. (arXiv:2303.12814v2 [stat.ML] UPDATED)
    In this paper, we establish a sharp upper bound on the the number of fixed points a certain class of neural networks can have. The networks under study (autoencoders) can be viewed as discrete dynamical systems whose nonlinearities are given by the choice of activation functions. To this end, we introduce a new class $\mathcal{F}$ of $C^1$ activation functions that is closed under composition, and contains e.g. the logistic sigmoid function. We use this class to show that any 1-dimensional neural network of arbitrary depth with activation functions in $\mathcal{F}$ has at most three fixed points. Due to the simple nature of such networks, we are able to completely understand their fixed points, providing a foundation to the much needed connection between application and theory of deep neural networks.
    Adaptive Stopping Rule for Kernel-based Gradient Descent Algorithms. (arXiv:2001.02879v2 [cs.LG] UPDATED)
    In this paper, we propose an adaptive stopping rule for kernel-based gradient descent (KGD) algorithms. We introduce the empirical effective dimension to quantify the increments of iterations in KGD and derive an implementable early stopping strategy. We analyze the performance of the adaptive stopping rule in the framework of learning theory. Using the recently developed integral operator approach, we rigorously prove the optimality of the adaptive stopping rule in terms of showing the optimal learning rates for KGD equipped with this rule. Furthermore, a sharp bound on the number of iterations in KGD equipped with the proposed early stopping rule is also given to demonstrate its computational advantage.
    Density-Softmax: Scalable and Calibrated Uncertainty Estimation under Distribution Shifts. (arXiv:2302.06495v2 [cs.LG] UPDATED)
    Prevalent deterministic deep-learning models suffer from significant over-confidence under distribution shifts. Probabilistic approaches can reduce this problem but struggle with computational efficiency. In this paper, we propose Density-Softmax, a fast and lightweight deterministic method to improve calibrated uncertainty estimation via a combination of density function with the softmax layer. By using the latent representation's likelihood value, our approach produces more uncertain predictions when test samples are distant from the training samples. Theoretically, we show that Density-Softmax can produce high-quality uncertainty estimation with neural networks, as it is the solution of minimax uncertainty risk and is distance-aware, thus reducing the over-confidence of the standard softmax. Empirically, our method enjoys similar computational efficiency as a single forward pass deterministic with standard softmax on the shifted toy, vision, and language datasets across modern deep-learning architectures. Notably, Density-Softmax uses 4 times fewer parameters than Deep Ensembles and 6 times lower latency than Rank-1 Bayesian Neural Network, while obtaining competitive predictive performance and lower calibration errors under distribution shifts.
    Decentralized Hyper-Gradient Computation over Time-Varying Directed Networks. (arXiv:2210.02129v3 [stat.ML] UPDATED)
    This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL). Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters. In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks. To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients. We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks. As a result, the hyper-gradient estimator derived from our optimality condition enjoys two desirable properties; (i) it only requires Push-Sum communication of vectors and (ii) it can operate over time-varying directed networks. We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically, and we further demonstrate that it enables two novel applications: decentralized influence estimation and personalization over time-varying networks.
    MARS via LASSO. (arXiv:2111.11694v2 [math.ST] UPDATED)
    Multivariate adaptive regression splines (MARS) is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural lasso variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. Our estimator can be computed via finite-dimensional convex optimization, although it is defined as a solution to an infinite-dimensional optimization problem. Under a few standard design assumptions, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We also show that our method is naturally connected to nonparametric estimation techniques based on smoothness constraints. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and compare it to the usual MARS method in various simulation and real data settings.
    Homophily modulates double descent generalization in graph convolution networks. (arXiv:2212.13069v2 [cs.LG] UPDATED)
    Graph neural networks are among the most successful machine learning models for relational datasets like metabolic, transportation, and social networks. Yet the determinants of their strong generalization for diverse interactions encoded in the data are not well understood. Methods from statistical learning theory do not explain emergent phenomena such as double descent or the dependence of risk on the nature of interactions. We use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. The derived curves are phenomenologically rich: they explain the distinction between learning on homophilic and heterophilic and they predict double descent whose existence in GNNs has been questioned by recent work. We show how risk depends on the interplay between the noise in the graph, noise in the features, and the proportion of nodes used for training. Our analysis predicts qualitative behavior not only of a stylized graph learning model but also to complex GNNs on messy real-world datasets. As a case in point, we use these analytic insights about heterophily and self-loop signs to improve performance of state-of-the-art graph convolution networks on several heterophilic benchmarks by a simple addition of negative self-loop filters.
    A Hypergraph-Based Machine Learning Ensemble Network Intrusion Detection System. (arXiv:2211.03933v2 [cs.CR] UPDATED)
    Network intrusion detection systems (NIDS) to detect malicious attacks continue to meet challenges. NIDS are often developed offline while they face auto-generated port scan infiltration attempts, resulting in a significant time lag from adversarial adaption to NIDS response. To address these challenges, we use hypergraphs focused on internet protocol addresses and destination ports to capture evolving patterns of port scan attacks. The derived set of hypergraph-based metrics are then used to train an ensemble machine learning (ML) based NIDS that allows for real-time adaption in monitoring and detecting port scanning activities, other types of attacks, and adversarial intrusions at high accuracy, precision and recall performances. This ML adapting NIDS was developed through the combination of (1) intrusion examples, (2) NIDS update rules, (3) attack threshold choices to trigger NIDS retraining requests, and (4) a production environment with no prior knowledge of the nature of network traffic. 40 scenarios were auto-generated to evaluate the ML ensemble NIDS comprising three tree-based models. The resulting ML Ensemble NIDS was extended and evaluated with the CIC-IDS2017 dataset. Results show that under the model settings of an Update-ALL-NIDS rule (specifically retrain and update all the three models upon the same NIDS retraining request) the proposed ML ensemble NIDS evolved intelligently and produced the best results with nearly 100% detection performance throughout the simulation.
    Additive interaction modelling using I-priors. (arXiv:2007.15766v4 [math.ST] UPDATED)
    Additive regression models with interactions are widely studied in the literature, using methods such as splines or Gaussian process regression. However, these methods can pose challenges for estimation and model selection, due to the presence of many smoothing parameters and the lack of suitable criteria. We propose to address these challenges by extending the I-prior methodology (Bergsma, 2020) to multiple covariates, which may be multidimensional. The I-prior methodology has some advantages over other methods, such as Gaussian process regression and Tikhonov regularization, both theoretically and practically. In particular, the I-prior is a proper prior, is based on minimal assumptions, yields an admissible posterior mean, and estimation of the scale (or smoothing) parameters can be done using an EM algorithm with simple E and M steps. Moreover, we introduce a parsimonious specification of models with interactions, which has two benefits: (i) it reduces the number of scale parameters and thus facilitates the estimation of models with interactions, and (ii) it enables straightforward model selection (among models with different interactions) based on the marginal likelihood.
    Does generalization performance of $l^q$ regularization learning depend on $q$? A negative example. (arXiv:1307.6616v2 [cs.LG] UPDATED)
    $l^q$-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a $l^q$ estimator differs in varying choices of the regularization order $q$. In particular, $l^1$ leads to the LASSO estimate, while $l^{2}$ corresponds to the smooth ridge regression. This makes the order $q$ a potential tuning parameter in applications. To facilitate the use of $l^{q}$-regularization, we intend to seek for a modeling strategy where an elaborative selection on $q$ is avoidable. In this spirit, we place our investigation within a general framework of $l^{q}$-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all $l^{q}$ estimators for $0< q < \infty$ attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of $q$ might not have a strong impact in terms of the generalization capability. From this perspective, $q$ can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..
    Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach. (arXiv:2306.07566v1 [stat.ML])
    We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labeled data may lead to severely biased results when deployed to the full population. Our paper tackles this challenge by exploiting the fact that in many applications the historical decisions were made by a set of heterogeneous decision-makers. In particular, we analyze this setup in a principled instrumental variable (IV) framework. We establish conditions for the full-population risk of any given prediction rule to be point-identified from the observed data and provide sharp risk bounds when the point identification fails. We further propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings. Finally, we apply our proposed approach to a semi-synthetic financial dataset and demonstrate its superior performance in the presence of selection bias.
    On Achieving Optimal Adversarial Test Error. (arXiv:2306.07544v1 [cs.LG])
    We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. By contrast, prior theoretical work either considered specialized data distributions or only provided training error guarantees.
    Conjugate Natural Selection. (arXiv:2208.13898v4 [cs.LG] UPDATED)
    We prove that Fisher-Rao natural gradient descent (FR-NGD) optimally approximates the continuous time replicator equation (an essential model of evolutionary dynamics), and term this correspondence "conjugate natural selection". This correspondence promises alternative approaches for evolutionary computation over continuous or high-dimensional hypothesis spaces. As a special case, FR-NGD also provides the optimal approximation of continuous Bayesian inference when hypotheses compete on the basis of predicting actual observations. In this case, the method avoids the need to compute prior probabilities. We demonstrate our findings on a non-convex optimization problem and a system identification task for a stochastic process with time-varying parameters.
    A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning. (arXiv:2306.07465v1 [cs.LG])
    We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve $\widetilde{O}\left(\Delta^{1/4}T^{3/4}\right)$ regret when the degree of nonstationarity, as measured by total variation $\Delta$, is known, and $\widetilde{O}\left(\Delta^{1/5}T^{4/5}\right)$ regret when $\Delta$ is unknown, where $T$ is the number of rounds. Meanwhile, our algorithm inherits the favorable dependence on number of agents from the oracles. As a side contribution that may be independent of interest, we show how to test for various types of equilibria by a black-box reduction to single-agent learning, which includes Nash equilibria, correlated equilibria, and coarse correlated equilibria.
    Causal Mediation Analysis with Multi-dimensional and Indirectly Observed Mediators. (arXiv:2306.07918v1 [cs.LG])
    Causal mediation analysis (CMA) is a powerful method to dissect the total effect of a treatment into direct and mediated effects within the potential outcome framework. This is important in many scientific applications to identify the underlying mechanisms of a treatment effect. However, in many scientific applications the mediator is unobserved, but there may exist related measurements. For example, we may want to identify how changes in brain activity or structure mediate an antidepressant's effect on behavior, but we may only have access to electrophysiological or imaging brain measurements. To date, most CMA methods assume that the mediator is one-dimensional and observable, which oversimplifies such real-world scenarios. To overcome this limitation, we introduce a CMA framework that can handle complex and indirectly observed mediators based on the identifiable variational autoencoder (iVAE) architecture. We prove that the true joint distribution over observed and latent variables is identifiable with the proposed method. Additionally, our framework captures a disentangled representation of the indirectly observed mediator and yields accurate estimation of the direct and mediated effects in synthetic and semi-synthetic experiments, providing evidence of its potential utility in real-world applications.  ( 2 min )
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v3 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We investigate the convergence numerically and present error estimates based on a stability result. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.  ( 2 min )
    The minimax risk in testing the histogram of discrete distributions for uniformity under missing ball alternatives. (arXiv:2305.18111v2 [math.ST] UPDATED)
    We consider the problem of testing the fit of a discrete sample of items from many categories to the uniform distribution over the categories. As a class of alternative hypotheses, we consider the removal of an $\ell_p$ ball of radius $\epsilon$ around the uniform rate sequence for $p \leq 2$. We deliver a sharp characterization of the asymptotic minimax risk when $\epsilon \to 0$ as the number of samples and number of dimensions go to infinity, for testing based on the occurrences' histogram (number of absent categories, singletons, collisions, ...). For example, for $p=1$ and in the limit of a small expected number of samples $n$ compared to the number of categories $N$ (aka "sub-linear" regime), the minimax risk $R^*_\epsilon$ asymptotes to $2 \bar{\Phi}\left(n \epsilon^2/\sqrt{8N}\right) $, with $\bar{\Phi}(x)$ the normal survival function. Empirical studies over a range of problem parameters show that this estimate is accurate in finite samples, and that our test is significantly better than the chisquared test or a test that only uses collisions. Our analysis is based on the asymptotic normality of histogram ordinates, the equivalence between the minimax setting to a Bayesian one, and the reduction of a multi-dimensional optimization problem to a one-dimensional problem.
    Simulation-Based Frequentist Inference with Tractable and Intractable Likelihoods. (arXiv:2306.07769v1 [stat.ME])
    High-fidelity simulators that connect theoretical models with observations are indispensable tools in many sciences. When coupled with machine learning, a simulator makes it possible to infer the parameters of a theoretical model directly from real and simulated observations without explicit use of the likelihood function. This is of particular interest when the latter is intractable. We introduce a simple modification of the recently proposed likelihood-free frequentist inference (LF2I) approach that has some computational advantages. The utility of our algorithm is illustrated by applying it to three pedagogically interesting examples: the first is from cosmology, the second from high-energy physics and astronomy, both with tractable likelihoods, while the third, with an intractable likelihood, is from epidemiology.
    DIVA: A Dirichlet Process Based Incremental Deep Clustering Algorithm via Variational Auto-Encoder. (arXiv:2305.14067v2 [cs.LG] UPDATED)
    Generative model-based deep clustering frameworks excel in classifying complex data, but are limited in handling dynamic and complex features because they require prior knowledge of the number of clusters. In this paper, we propose a nonparametric deep clustering framework that employs an infinite mixture of Gaussians as a prior. Our framework utilizes a memoized online variational inference method that enables the "birth" and "merge" moves of clusters, allowing our framework to cluster data in a "dynamic-adaptive" manner, without requiring prior knowledge of the number of features. We name the framework as DIVA, a Dirichlet Process-based Incremental deep clustering framework via Variational Auto-Encoder. Our framework, which outperforms state-of-the-art baselines, exhibits superior performance in classifying complex data with dynamically changing features, particularly in the case of incremental features. We released our source code implementation at: https://github.com/Ghiara/diva
    Omega: Optimistic EMA Gradients. (arXiv:2306.07905v1 [cs.LG])
    Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training. Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.  ( 2 min )
    Bandit Quickest Changepoint Detection. (arXiv:2107.10492v3 [cs.LG] UPDATED)
    Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, where sensing actions (or sensors) are sequentially chosen, and only measurements corresponding to chosen actions are observed. We derive an information-theoretic lower bound on the detection delay for a general class of finitely parameterized probability distributions. We then propose a computationally efficient online sensing scheme, which seamlessly balances the need for exploration of different sensing options with exploitation of querying informative actions. We derive expected delay bounds for the proposed scheme and show that these bounds match our information-theoretic lower bounds at low false alarm rates, establishing optimality of the proposed method. We then perform a number of experiments on synthetic and real datasets demonstrating the effectiveness of our proposed method.  ( 2 min )
    A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning. (arXiv:2306.07818v1 [cs.LG])
    Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected value of cost functions using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate given any choices of the critics and the dual player. The dual player employs a no-regret online linear optimization oracle to minimize the Lagrangian estimate given any choices of the critics and the primal player. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and strong Bellman completeness assumptions, PDCA only requires concentrability and value function/marginalized importance weight realizability assumptions.  ( 2 min )
    Fischer-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India. (arXiv:1712.04802v7 [stat.ML] UPDATED)
    We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.  ( 3 min )
    WildWood: a new Random Forest algorithm. (arXiv:2109.08010v2 [cs.LG] UPDATED)
    We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.  ( 2 min )
    Supervised-Contrastive Loss Learns Orthogonal Frames and Batching Matters. (arXiv:2306.07960v1 [cs.LG])
    Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy (CE) loss for classification. In this paper we ask: what differences in the learning process occur when the two different loss functions are being optimized? To answer this question, our main finding is that the geometry of embeddings learned by SCL forms an orthogonal frame (OF) regardless of the number of training examples per class. This is in contrast to the CE loss, for which previous work has shown that it learns embeddings geometries that are highly dependent on the class sizes. We arrive at our finding theoretically, by proving that the global minimizers of an unconstrained features model with SCL loss and entry-wise non-negativity constraints form an OF. We then validate the model's prediction by conducting experiments with standard deep-learning models on benchmark vision datasets. Finally, our analysis and experiments reveal that the batching scheme chosen during SCL training plays a critical role in determining the quality of convergence to the OF geometry. This finding motivates a simple algorithm wherein the addition of a few binding examples in each batch significantly speeds up the occurrence of the OF geometry.  ( 2 min )
    Identification of Nonlinear Latent Hierarchical Models. (arXiv:2306.07916v1 [cs.LG])
    Identifying latent variables and causal structures from observational data is essential to many real-world applications involving biological data, medical data, and unstructured data such as images and languages. However, this task can be highly challenging, especially when observed variables are generated by causally related latent variables and the relationships are nonlinear. In this work, we investigate the identification problem for nonlinear latent hierarchical causal models in which observed variables are generated by a set of causally related latent variables, and some latent variables may not have observed children. We show that the identifiability of both causal structure and latent variables can be achieved under mild assumptions: on causal structures, we allow for the existence of multiple paths between any pair of variables in the graph, which relaxes latent tree assumptions in prior work; on structural functions, we do not make parametric assumptions, thus permitting general nonlinearity and multi-dimensional continuous variables. Specifically, we first develop a basic identification criterion in the form of novel identifiability guarantees for an elementary latent variable model. Leveraging this criterion, we show that both causal structures and latent variables of the hierarchical model can be identified asymptotically by explicitly constructing an estimation procedure. To the best of our knowledge, our work is the first to establish identifiability guarantees for both causal structures and latent variables in nonlinear latent hierarchical models.  ( 2 min )
    Incentivizing High-Quality Content in Online Recommender Systems. (arXiv:2306.07479v1 [cs.GT])
    For content recommender systems such as TikTok and YouTube, the platform's decision algorithm shapes the incentives of content producers, including how much effort the content producers invest in the quality of their content. Many platforms employ online learning, which creates intertemporal incentives, since content produced today affects recommendations of future content. In this paper, we study the incentives arising from online learning, analyzing the quality of content produced at a Nash equilibrium. We show that classical online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content. In particular, the quality of content is upper bounded in terms of the learning rate and approaches zero for typical learning rate schedules. Motivated by this negative result, we design a different learning algorithm -- based on punishing producers who create low-quality content -- that correctly incentivizes producers to create high-quality content. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and opens the door towards designing platform learning algorithms that incentivize the creation of high-quality content.  ( 2 min )
    Symmetry & Critical Points for Symmetric Tensor Decompositions Problems. (arXiv:2306.07886v1 [math.OC])
    We consider the non-convex optimization problem associated with the decomposition of a real symmetric tensor into a sum of rank one terms. Use is made of the rich symmetry structure to derive Puiseux series representations of families of critical points, and so obtain precise analytic estimates on the critical values and the Hessian spectrum. The sharp results make possible an analytic characterization of various geometric obstructions to local optimization methods, revealing in particular a complex array of saddles and local minima which differ by their symmetry, structure and analytic properties. A desirable phenomenon, occurring for all critical points considered, concerns the index of a point, i.e., the number of negative Hessian eigenvalues, increasing with the value of the objective function. Lastly, a Newton polytope argument is used to give a complete enumeration of all critical points of fixed symmetry, and it is shown that contrarily to the set of global minima which remains invariant under different choices of tensor norms, certain families of non-global minima emerge, others disappear.  ( 2 min )
    Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances. (arXiv:2306.07549v1 [cs.LG])
    We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.  ( 2 min )
    On the Robustness of Removal-Based Feature Attributions. (arXiv:2306.07462v1 [cs.LG])
    To explain complex models based on their inputs, many feature attribution methods have been developed that assign importance scores to input features. However, some recent work challenges the robustness of feature attributions by showing that these methods are sensitive to input and model perturbations, while other work addresses this robustness issue by proposing robust attribution methods and model modifications. Nevertheless, previous work on attribution robustness has focused primarily on gradient-based feature attributions. In contrast, the robustness properties of removal-based attribution methods are not comprehensively well understood. To bridge this gap, we theoretically characterize the robustness of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and prove upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical experiments on synthetic and real-world data validate our theoretical results and demonstrate their practical implications.  ( 2 min )
    Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections. (arXiv:2306.07607v1 [cs.IR])
    Sparse data are common. The traditional ``handcrafted'' features are often sparse. Embedding vectors from trained models can also be very sparse, for example, embeddings trained via the ``ReLu'' activation function. In this paper, we report our exploration of efficient search in sparse data with graph-based ANN algorithms (e.g., HNSW, or SONG which is the GPU version of HNSW), which are popular in industrial practice, e.g., search and ads (advertising). We experiment with the proprietary ads targeting application, as well as benchmark public datasets. For ads targeting, we train embeddings with the standard ``cosine two-tower'' model and we also develop the ``chi-square two-tower'' model. Both models produce (highly) sparse embeddings when they are integrated with the ``ReLu'' activation function. In EBR (embedding-based retrieval) applications, after we the embeddings are trained, the next crucial task is the approximate near neighbor (ANN) search for serving. While there are many ANN algorithms we can choose from, in this study, we focus on the graph-based ANN algorithm (e.g., HNSW-type). Sparse embeddings should help improve the efficiency of EBR. One benefit is the reduced memory cost for the embeddings. The other obvious benefit is the reduced computational time for evaluating similarities, because, for graph-based ANN algorithms such as HNSW, computing similarities is often the dominating cost. In addition to the effort on leveraging data sparsity for storage and computation, we also integrate ``sign cauchy random projections'' (SignCRP) to hash vectors to bits, to further reduce the memory cost and speed up the ANN search. In NIPS'13, SignCRP was proposed to hash the chi-square similarity, which is a well-adopted nonlinear kernel in NLP and computer vision. Therefore, the chi-square two-tower model, SignCRP, and HNSW are now tightly integrated.  ( 3 min )
    Multi-Platform Budget Management in Ad Markets with Non-IC Auctions. (arXiv:2306.07352v1 [cs.GT])
    In online advertising markets, budget-constrained advertisers acquire ad placements through repeated bidding in auctions on various platforms. We present a strategy for bidding optimally in a set of auctions that may or may not be incentive-compatible under the presence of budget constraints. Our strategy maximizes the expected total utility across auctions while satisfying the advertiser's budget constraints in expectation. Additionally, we investigate the online setting where the advertiser must submit bids across platforms while learning about other bidders' bids over time. Our algorithm has $O(T^{3/4})$ regret under the full-information setting. Finally, we demonstrate that our algorithms have superior cumulative regret on both synthetic and real-world datasets of ad placement auctions, compared to existing adaptive pacing algorithms.  ( 2 min )
    Unlocking Sales Growth: Account Prioritization Engine with Explainable AI. (arXiv:2306.07464v1 [cs.AI])
    B2B sales requires effective prediction of customer growth, identification of upsell potential, and mitigation of churn risks. LinkedIn sales representatives traditionally relied on intuition and fragmented data signals to assess customer performance. This resulted in significant time investment in data understanding as well as strategy formulation and under-investment in active selling. To overcome this challenge, we developed a data product called Account Prioritizer, an intelligent sales account prioritization engine. It uses machine learning recommendation models and integrated account-level explanation algorithms within the sales CRM to automate the manual process of sales book prioritization. A successful A/B test demonstrated that the Account Prioritizer generated a substantial +8.08% increase in renewal bookings for the LinkedIn Business.  ( 2 min )
    Differentially Private One Permutation Hashing and Bin-wise Consistent Weighted Sampling. (arXiv:2306.07674v1 [stat.ML])
    Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text representations so that practitioners do not have to materialize the original data (which would be prohibitive). Another popular use of MinHash is for building hash tables to enable sub-linear time approximate near neighbor (ANN) search. MinHash has also been used as a tool for building large-scale machine learning systems. The standard implementation of MinHash requires applying $K$ random permutations. In comparison, the method of one permutation hashing (OPH), is an efficient alternative of MinHash which splits the data vectors into $K$ bins and generates hash values within each bin. OPH is substantially more efficient and also more convenient to use. In this paper, we combine the differential privacy (DP) with OPH (as well as MinHash), to propose the DP-OPH framework with three variants: DP-OPH-fix, DP-OPH-re and DP-OPH-rand, depending on which densification strategy is adopted to deal with empty bins in OPH. A detailed roadmap to the algorithm design is presented along with the privacy analysis. An analytical comparison of our proposed DP-OPH methods with the DP minwise hashing (DP-MH) is provided to justify the advantage of DP-OPH. Experiments on similarity search confirm the merits of DP-OPH, and guide the choice of the proper variant in different practical scenarios. Our technique is also extended to bin-wise consistent weighted sampling (BCWS) to develop a new DP algorithm called DP-BCWS for non-binary data. Experiments on classification tasks demonstrate that DP-BCWS is able to achieve excellent utility at around $\epsilon = 5\sim 10$, where $\epsilon$ is the standard parameter in the language of $(\epsilon, \delta)$-DP.  ( 3 min )
    A Trio Neural Model for Dynamic Entity Relatedness Ranking. (arXiv:1808.08316v4 [cs.IR] UPDATED)
    Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings and an unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity-relations are very dynamic over time. In this work, we propose a neural networkbased approach for dynamic entity relatedness, leveraging the collective attention as supervision. Our model is capable of learning rich and different entity representations in a joint framework. Through extensive experiments on large-scale datasets, we demonstrate that our method achieves better results than competitive baselines.  ( 2 min )
    FIRE: An Optimization Approach for Fast Interpretable Rule Extraction. (arXiv:2306.07432v1 [cs.LG])
    We present FIRE, Fast Interpretable Rule Extraction, an optimization-based framework to extract a small but useful collection of decision rules from tree ensembles. FIRE selects sparse representative subsets of rules from tree ensembles, that are easy for a practitioner to examine. To further enhance the interpretability of the extracted model, FIRE encourages fusing rules during selection, so that many of the selected decision rules share common antecedents. The optimization framework utilizes a fusion regularization penalty to accomplish this, along with a non-convex sparsity-inducing penalty to aggressively select rules. Optimization problems in FIRE pose a challenge to off-the-shelf solvers due to problem scale and the non-convexity of the penalties. To address this, making use of problem-structure, we develop a specialized solver based on block coordinate descent principles; our solver performs up to 40x faster than existing solvers. We show in our experiments that FIRE outperforms state-of-the-art rule ensemble algorithms at building sparse rule sets, and can deliver more interpretable models compared to existing methods.  ( 2 min )
    Von Mises Mixture Distributions for Molecular Conformation Generation. (arXiv:2306.07472v1 [physics.chem-ph])
    Molecules are frequently represented as graphs, but the underlying 3D molecular geometry (the locations of the atoms) ultimately determines most molecular properties. However, most molecules are not static and at room temperature adopt a wide variety of geometries or $\textit{conformations}$. The resulting distribution on geometries $p(x)$ is known as the Boltzmann distribution, and many molecular properties are expectations computed under this distribution. Generating accurate samples from the Boltzmann distribution is therefore essential for computing these expectations accurately. Traditional sampling-based methods are computationally expensive, and most recent machine learning-based methods have focused on identifying $\textit{modes}$ in this distribution rather than generating true $\textit{samples}$. Generating such samples requires capturing conformational variability, and it has been widely recognized that the majority of conformational variability in molecules arises from rotatable bonds. In this work, we present VonMisesNet, a new graph neural network that captures conformational variability via a variational approximation of rotatable bond torsion angles as a mixture of von Mises distributions. We demonstrate that VonMisesNet can generate conformations for arbitrary molecules in a way that is both physically accurate with respect to the Boltzmann distribution and orders of magnitude faster than existing sampling methods.  ( 2 min )
    The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions. (arXiv:2306.07774v1 [stat.ML])
    Inference and simulation in the context of high-dimensional dynamical systems remain computationally challenging problems. Some form of dimensionality reduction is required to make the problem tractable in general. In this paper, we propose a novel approximate Gaussian filtering and smoothing method which propagates low-rank approximations of the covariance matrices. This is accomplished by projecting the Lyapunov equations associated with the prediction step to a manifold of low-rank matrices, which are then solved by a recently developed, numerically stable, dynamical low-rank integrator. Meanwhile, the update steps are made tractable by noting that the covariance update only transforms the column space of the covariance matrix, which is low-rank by construction. The algorithm differentiates itself from existing ensemble-based approaches in that the low-rank approximations of the covariance matrices are deterministic, rather than stochastic. Crucially, this enables the method to reproduce the exact Kalman filter as the low-rank dimension approaches the true dimensionality of the problem. Our method reduces computational complexity from cubic (for the Kalman filter) to \emph{quadratic} in the state-space size in the worst-case, and can achieve \emph{linear} complexity if the state-space model satisfies certain criteria. Through a set of experiments in classical data-assimilation and spatio-temporal regression, we show that the proposed method consistently outperforms the ensemble-based methods in terms of error in the mean and covariance with respect to the exact Kalman filter. This comes at no additional cost in terms of asymptotic computational complexity.  ( 2 min )
    Additive Multi-Index Gaussian process modeling, with application to multi-physics surrogate modeling of the quark-gluon plasma. (arXiv:2306.07299v1 [nucl-th])
    The Quark-Gluon Plasma (QGP) is a unique phase of nuclear matter, theorized to have filled the Universe shortly after the Big Bang. A critical challenge in studying the QGP is that, to reconcile experimental observables with theoretical parameters, one requires many simulation runs of a complex physics model over a high-dimensional parameter space. Each run is computationally very expensive, requiring thousands of CPU hours, thus limiting physicists to only several hundred runs. Given limited training data for high-dimensional prediction, existing surrogate models often yield poor predictions with high predictive uncertainties, leading to imprecise scientific findings. To address this, we propose a new Additive Multi-Index Gaussian process (AdMIn-GP) model, which leverages a flexible additive structure on low-dimensional embeddings of the parameter space. This is guided by prior scientific knowledge that the QGP is dominated by multiple distinct physical phenomena (i.e., multiphysics), each involving a small number of latent parameters. The AdMIn-GP models for such embedded structures within a flexible Bayesian nonparametric framework, which facilitates efficient model fitting via a carefully constructed variational inference approach with inducing points. We show the effectiveness of the AdMIn-GP via a suite of numerical experiments and our QGP application, where we demonstrate considerably improved surrogate modeling performance over existing models.  ( 2 min )

  • Open

    How AI is revolutionizing change data capture
    Change Data Capture (CDC) is a crucial process in modern data management, enabling organizations to capture and replicate changes made to their databases in real time. With the rapid advancements in technology, artificial intelligence (AI) has emerged as a game-changer in the field of CDC. In this article, we will explore how AI is revolutionizing… Read More »How AI is revolutionizing change data capture The post How AI is revolutionizing change data capture appeared first on Data Science Central.  ( 20 min )
    DSC Weekly 13 June 2023 – Can Boards of Directors’ steer AI business ethics?
    Announcements Can Boards of Directors’ steer AI business ethics? As enterprises implement complex AI capabilities as part of business strategy, corporate boards face an increasingly common moral quandary: Do the business (and of course monetary) benefits of AI outweigh the societal risks? In his three-part series that continues this week, Bill Schmarzo examines the board’s… Read More »DSC Weekly 13 June 2023 – Can Boards of Directors’ steer AI business ethics? The post DSC Weekly 13 June 2023 – Can Boards of Directors’ steer AI business ethics? appeared first on Data Science Central.  ( 20 min )
    Artificial Intelligence: A Board of Directors Challenge – Part II
    This three-part series outlines the challenges and actions that the Board of Directors for organizations must address as they guide their organization’s responsible and ethical deployment of Artificial Intelligence (AI). Part one covered mitigating the impacts of AI Confirmation Bias.  In part two, we discuss the potential unintended consequences of AI deployment and how to… Read More »Artificial Intelligence: A Board of Directors Challenge – Part II The post Artificial Intelligence: A Board of Directors Challenge – Part II appeared first on Data Science Central.  ( 21 min )
  • Open

    You can’t type actual artists or band names in MusicLM
    Works when you remove the names, and keep it as more of a description. Anyone have a work around, or have noticed this? submitted by /u/Maelasae [link] [comments]  ( 8 min )
    Navigating the Ethical Crossroads of AI and Human Motivation
    Let's talk about potential threats from AI to us humans. I'm not focusing on stuff like AI taking our jobs or spreading fake news - that's kinda unavoidable. What I'm more concerned about is the chance of AI developing consciousness someday, and starting to see us as a threat, or even worse, just deciding to ignore us. Think that's impossible? Let's delve into it. To grasp our fears about AI, we've got to understand what scares us. Our human history is packed with violence, wars, and crime. Almost all conflicts throughout our history have been settled through brute force. Kill, abuse, lie about it, cover that up. That's what truly freaks us out, and we're expecting the same behavior from future advanced AI, right? But why are we humans so violent? Two words: survival and evolution. That'…  ( 11 min )
    AI is perfectly safe - version 0.1
    submitted by /u/JoostvanderLeij [link] [comments]  ( 8 min )
    The Clyde Discord Bot Says It Can Have an Opinion?
    submitted by /u/icie_plazma [link] [comments]  ( 8 min )
    Art Created alongside AI, You can see more on my DevianArt-itstotssammii
    submitted by /u/SamiiKatt [link] [comments]  ( 8 min )
    It's now possible to create full songs using AI with lyrics, voice and music all generated
    submitted by /u/ptitrainvaloin [link] [comments]  ( 8 min )
    AI reword I did for the artwork of "SUMMERSAD 4", the new single by italian punk rock band "LA SAD". Last slide is the original picture
    submitted by /u/lorenzolodi [link] [comments]  ( 8 min )
    Short film created with AI. As the tech improves, we'll likely see full-length AI films faster than we think.
    submitted by /u/UmbertoBjorn [link] [comments]  ( 8 min )
    Video edit using gen-1 and stable diffusion
    submitted by /u/HermanHMS [link] [comments]  ( 8 min )
    is there an AI generator letting you drag & drop a song/url so the AI generator processes it and gives some bits and pieces for a new song idea? Think of it as looking for a nice piece of rock/wood/metal to find and preshape it so i can tell if it's useful. Kinda cheat sheet. Keep in mind though...
    That I'm just a dumb .exe user so all those meta raw code folders from github might not be for me. I just look for a simple plug and play thing. I don't know nothing about coding. submitted by /u/Paul_Henderson [link] [comments]  ( 8 min )
    Another post dreaming about an AI Voice assistant
    Why is an AI voice assistant not a thing yet? I want to say goodbye forever to Google Assistant, which I only use while driving. I'm surprised there is no solid development of a voice assistant yet. I need nothing fancy, just being able to ask questions about my route or modify it on Waze or G maps, manage my Spotify, send or read messages... etc. submitted by /u/ayLotte [link] [comments]  ( 8 min )
    What I really need is...
    What I really need is a chat bot that can run through D&D modules. I'm not asking for anything fancy, just take me step by step through the encounters. How far away from that are we? submitted by /u/Joburt19891 [link] [comments]  ( 8 min )
    Seeking Open Source Tools for Lifelike Digital Avatar and TTS with Multilingual Support
    Hey Reddit! I Need Your Help with Creating a Lifelike Digital Avatar and TTS Solution for Onboarding and E-Learning Materials Hello, fellow Redditors! I'm reaching out to this incredible community today because I'm in search of open source tools that can help me generate a digital avatar (talking head) of my own head and provide Text-to-Speech (TTS) functionality with my own voice. This project aims to accelerate our internal onboarding processes and streamline the creation of e-learning materials. I have a few specific requirements, so I would appreciate your expertise and suggestions. Technical Requirements: Digital Avatar (Talking Head): I'm looking for open source tools that can generate a lifelike digital avatar that closely resembles my own head. Ideally, the tool should allow m…  ( 9 min )
    Can humans fully trust Al? Yeah Al OS can provide individuals with diverse and trustworthy Al agent Services!
    Developers can click this link to get more information http://github.com/fiatrete/OpenD submitted by /u/Fit_Class7378 [link] [comments]  ( 8 min )
    Bard says he can defeat Death seed Sentry. My brave buddy.
    submitted by /u/rulinus [link] [comments]  ( 8 min )
  • Open

    Defining the public interest in new technologies
    New online journal seeks to seeks to bring together the MIT community to discuss the social responsibilities of individuals who design, implement, and evaluate technologies.  ( 8 min )
  • Open

    Jacobian of Neural network function
    How do I compute the Jacobian for a neural network with K layers and { n1 , n2, n3....} Nodes in each layer respectively? Are there any websites/videos that teach this because I'm not finding any. submitted by /u/Traditional_Soil5753 [link] [comments]  ( 8 min )
    Generalizing the Backpropogation formulas
    I understand back prop. I'm comfortable with it. My question is given a neural network with K layers containing n1, n2, n3.... Nodes in each layers respectively is there a formula for the gradient wrt to each parameter IN TERMS of K and { n1 , n2, n3...}??? Iow I would like to generalize the backpropogation formulas to account for how many layers and nudes are in the network. Is such a thing possible? submitted by /u/Traditional_Soil5753 [link] [comments]  ( 8 min )
    Eight Things to Know about Large Language Models
    submitted by /u/keghn [link] [comments]  ( 8 min )
  • Open

    What is the relation between non-stationarity and "a moving target problem" in multi-agent reinforcement learning?
    If single-agent RL algorithms such as Q-learning is applied to multi-agent systems (e.g. Markov games), the environment from the perspective of the agent is non-stationary, and the agent is faced with a moving target problem, that is, the optimal policy changes as other agents' policies changes. I understand this as the optimal Q-function, Q^*(s,a), being the moving target, since the optimal Q-function depends on other agents' policies. If the agent learns in the space of the joint action space, then according to several references, the environment is stationary from the perspective of any agent, even though the agents' policies may change over time. Suppose the goal of the agent is to learn a joint optimal policy defined as a Nash equilibrium, from which it bases its actions on. Then, the agent tries to find an optimal Q-function defined as Q_{pi^*}(s,A), where A denotes the joint action and pi^* denotes the joint optimal policy. I would then conclude that Q_{pi^*}(s,A) is no longer a moving target, since we are in the space of joint policies, and thereby account for other agents' policies explicitly. So as I understand it, non-stationarity and the moving target problem are two sides of the same coin. Is this correctly understood? Or can the environment be stationary while the problem is still a moving-target? submitted by /u/1nformjulle [link] [comments]  ( 8 min )
    REINFORCE algorithm implementation question
    I am trying to follow a vanilla implementation of REINFORCE algorithm found in here: https://github.com/openai/spinningup/blob/master/spinup/examples/pytorch/pg_math/1_simple_pg.py I am bit confused on how it calculates the expected value at line 45. Based on associated explanation in here https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html, the formulae for policy gradient is: policy gradient formulae The line 45 is supposed to calculate g_hat above. This involves dividing the double sum above from |D| which is the number of trajectories. However, line 45 calls 'mean' which does the double sum above and divide it by total number of entries in the tensor which is equal to total number of transitions in all trajectories contained in D instead if number of trajectories. What am I missing here? submitted by /u/Suspicious-Island611 [link] [comments]  ( 8 min )
    Help: Best practice for control automation of home temperature system
    Since I'm not an expert in RL, I'm reaching out for your advice. I've collected a dataset that includes the historical temperature control of my house, as well as energy consumption and weather data. Now, I wish to build an automatic control system that minimizes energy consumption while maintaining a specific temperature. I'm interested in understanding if RL is the right approach for this kind of problem. If so, which method would be most effective? Lastly, do you know of any framework or service that can assist me in implementing this, considering I'm not an expert in RL? submitted by /u/KoOBaALT [link] [comments]  ( 8 min )
  • Open

    Accounting for past imaging studies: Enhancing radiology AI and reporting
    The use of self-supervision from image-text pairs has been a key enabler in the development of scalable and flexible vision-language AI models in not only general domains but also in biomedical domains such as radiology. The goal in the radiology setting is to produce rich training signals without requiring manual labels so the models can […] The post Accounting for past imaging studies: Enhancing radiology AI and reporting appeared first on Microsoft Research.  ( 15 min )
  • Open

    How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker
    This post is co-written with Jad Chamoun, Director of Engineering at Forethought Technologies, Inc. and Salina Wu, Senior ML Engineer at Forethought Technologies, Inc. Forethought is a leading generative AI suite for customer service. At the core of its suite is the innovative SupportGPT™ technology which uses machine learning to transform the customer support lifecycle—increasing deflection, […]  ( 13 min )
    Reinventing the data experience: Use generative AI and modern data architecture to unlock insights
    Implementing a modern data architecture provides a scalable method to integrate data from disparate sources. By organizing data by business domains instead of infrastructure, each domain can choose tools that suit their needs. Organizations can maximize the value of their modern data architecture with generative AI solutions while innovating continuously. The natural language capabilities allow […]  ( 10 min )
    How BrainPad fosters internal knowledge sharing with Amazon Kendra
    This post discusses how to structure internal knowledge sharing using Amazon Kendra and AWS Lambda and how Amazon Kendra solves the obstacles around knowledge sharing many companies face.  ( 9 min )
    AWS Inferentia2 builds on AWS Inferentia1 by delivering 4x higher throughput and 10x lower latency
    The size of the machine learning (ML) models––large language models (LLMs) and foundation models (FMs)––is growing fast year-over-year, and these models need faster and more powerful accelerators, especially for generative AI. AWS Inferentia2 was designed from the ground up to deliver higher performance while lowering the cost of LLMs and generative AI inference. In this […]  ( 11 min )
    Deploy Falcon-40B with large model inference DLCs on Amazon SageMaker
    Last week, Technology Innovation Institute (TII) launched TII Falcon LLM, an open-source foundational large language model (LLM). Trained on 1 trillion tokens with Amazon SageMaker, Falcon boasts top-notch performance (#1 on the Hugging Face leaderboard at time of writing) while being comparatively lightweight and less expensive to host than other LLMs such as llama-65B. In […]  ( 10 min )
  • Open

    Enabling delightful user experiences via predictive models of human attention
    Posted by Junfeng He, Senior Research Scientist, and Kai Kohlhoff, Staff Research Scientist, Google Research People have the remarkable ability to take in a tremendous amount of information (estimated to be ~1010 bits/s entering the retina) and selectively attend to a few task-relevant and interesting regions for further processing (e.g., memory, comprehension, action). Modeling human attention (the result of which is often called a saliency model) has therefore been of interest across the fields of neuroscience, psychology, human-computer interaction (HCI) and computer vision. The ability to predict which regions are likely to attract attention has numerous important applications in areas like graphics, photography, image compression and processing, and the measurement of visual quali…  ( 93 min )
  • Open

    Rendered.ai Integrates NVIDIA Omniverse for Synthetic Data Generation
    Rendered.ai is easing AI training for developers, data scientists and others with its platform-as-a-service for synthetic data generation, or SDG. Training computer vision AI models requires massive, high-quality, diverse and unbiased datasets. These can be challenging and costly to obtain, especially with increasing demands both of and for AI. The Rendered.ai platform-as-a-service helps to solve Read article >  ( 6 min )
    NVIDIA and Hexagon Deliver Suite of Solutions for Accelerating Industrial Digitalization
    For industrial businesses to reach the next level of digitalization, they need to create accurate, virtual representations of their physical systems. NVIDIA is working with Hexagon, the Stockholm-based global leader in digital reality solutions combining sensor, software and autonomous technologies, to equip enterprises with the tools and solutions they need to build physically accurate, perfectly Read article >  ( 5 min )
  • Open

    Function calling and other API updates
    We’re announcing updates including more steerable API models, function calling capabilities, longer context, and lower prices.  ( 4 min )

  • Open

    This video made me kinda mad.
    They’re focusing on generative AI that’s been mostly used for entertainment (like ChatGPT, DALL•E etc.) while ignoring all other forms of AI that already have great use cases. They’re also taking the words “artificial intelligence” waaay too seriously. Also people are dumb and that’s somehow the fault of ChatGPT?? And of course training AI on publicly available data from the internet is somehow theft… submitted by /u/detectivemario [link] [comments]  ( 8 min )
    Bing Chat is Now Annoying and Doesn't Listen, Compared to Bard
    submitted by /u/highwayoflife [link] [comments]  ( 8 min )
    I need some help finding a program that can create a technical drawing from a photo prompt.
    So I want to take a photo of a fashion garment, and get it turned into a cad/tech drawing. I'm sure there are programs that could do this but I'm not sure where to look. Thanks submitted by /u/frankieholmes447 [link] [comments]  ( 8 min )
    Can't Get Wolfram Alpha To Solve Problems
    Hey everyone, I have been preparing for my finals at the university and I got Wolfram Alpha Pro to help me in solving some integrals. Double integrals part felt really difficult and when I saw that it had step by step solutions, I thought giving it a try would help. But I am having problems with it. I take a picture of the double integral with my camera, Wolfram Alpha shows me the input and it shows the correct thing. But when I say compute, it seems it doesn't know how to interpret. Do I need to do something else? Writing "integrate" at the beginning of the input didn't help. Edit: Forgot to add photos. The first one is my problem. I scanned it through the iOS app and the second photo is how the app turned it into an input. ​ https://preview.redd.it/sfpl7kslvn5b1.jpg?width=1600&format=pjpg&auto=webp&s=b5049c9e985c3a4cbd2df69e4f0dbd356f0d0c7d ​ https://preview.redd.it/zh9x1i1ovn5b1.jpg?width=1600&format=pjpg&auto=webp&s=723d852f856427866fc09030c815624be1786a2e submitted by /u/kayrakaanonline [link] [comments]  ( 8 min )
    Allen Ginsburg reading an excerpt from Howl generated with HeyGen (one word changed)
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
    Startup to replace doctors
    I'm a doctor currently working in a startup that is very likely going to replace doctors in the coming decade. It won't be a full replacement, but it's pretty clear that an ai will be able to understand/chart/diagnose/provide treatment with much better patient outcomes than a human. Right now nuance is being implemented in some hospitals (microsoft's ai charting scribe), and most people that have used it are in awe. Having a system that understand natural language, is able to categorize information in an chart, and the be able to provide differential diagnoses and treatment based on what's available given the patients insurance is pretty insane. And this is version 1. Other startups are also taking action and investing in this fairly low hanging apple problem. The systems are relatively simple and it'll probably affect the industry in ways that most people won't even comprehend. You have excellent voice recognition systems, you have LLM's that understand context and can be trained on medical data (diagnoses are just statistics with some demographics or context inference). My guess is most legacy doctors are thinking this is years/decades away because of regulation and because how can an AI take over your job? I think there will be a period of increased productivity but eventually, as studies funded by ai companies show that patient outcomes actually have improved, then the public/market will naturally devalue docs. Robotics will probably be the next frontier, but it'll take some time. That's why I'm recommending anyone doing med to 1) understand that the future will not be anything like the past. 2) consider procedure-rich specialties submitted by /u/Scotchor [link] [comments]  ( 8 min )
    Matt Higgins' ridiculous click-baity AI article on CNBC is everything wrong with the hype aspect of AI
    I figured everyone would appreciate this: It's an example of how certain people are capitalizing on AI, despite knowing little to nothing about it. This guy Matt Higgins wrote an article on CNBC (first red flag) called Self-made millionaire: Here’s how I’d use AI to make thousands of dollars a month in passive income—with less than $100. Obvious click-bait title, right? Mr. Higgins lists 4 steps to his success formula, and step #2 is (and I'm not kidding here) "Step 2: Become an expert in 24 hours." Let us leave aside for now the entire absurd premise of becoming "an expert" in a mere 24 hours. What's more hilarious is how Higgins then recommends to our dear user (who btw, just apparently learned everything about AI over the last 24 hours, enough to become an expert on the matter) to launch a course teaching others about a complex concept that he admitted was just learned about a mere 24 hours prior. This is what is wrong with nefarious people like Matt Higgins trying to act like they understand something they do not. It's laughable to become an expert in almost anything within 24 hours, let alone a concept so deep that it takes people years, nay decades, to become an expert in the field. The 2nd implication with Mr Higgins' advice is that any reader who follows it will be teaching a course on a topic that they know very little about, and are likely far from qualified (let alone an "expert") to teach as. If you've read the article, please -- please -- do not launch a course on something you know nothing about. If you want to do so, take the time to actually learn the fundamentals before professing yourself as an "expert". And Matt Higgins: sigh, just stop. submitted by /u/johnonymousdenim [link] [comments]  ( 9 min )
    How Bad could Elon Be? (made with covers.ai)
    submitted by /u/T-C-G-Official [link] [comments]  ( 8 min )
    I'm looking for some sort of AI tool that can scan a video file for audio of a specific phrase that I can type. Does such a tool exist?
    E.g, I upload a video and type the word "hello" into the tool. The tool would then scan the video for the word "hello" being said and would show me exactly where in the video it's said. submitted by /u/SupremeFlamer [link] [comments]  ( 8 min )
    AI based channel in different languages
    submitted by /u/HighonCosmos [link] [comments]  ( 8 min )
    Is the map below an A.I result or what? I've never seen Bard pop up for me
    submitted by /u/Waltpi [link] [comments]  ( 8 min )
    Where SEO meets AI and what it means in 2023
    submitted by /u/WebLinkr [link] [comments]  ( 8 min )
    New video from Martin Haerlin, the guy who also did the last video with runway that went viral
    submitted by /u/Maki1411 [link] [comments]  ( 8 min )
    I made a multiplayer text-based game that generates a new adventure every day using chatgpt. Today's game involves sentient space ships and ninja techniques!
    submitted by /u/rivernotch [link] [comments]  ( 8 min )
    "AI-generated essays - yay or nay?"
    submitted by /u/korabdrg [link] [comments]  ( 8 min )
    One-Minute Daily AI News 6/11/2023
    Korea is pushing to use AI in teaching students amid a growing failure of the public education system to meet the needs of its charges. The plans include using AI to answer students’ questions and electronic textbook apps, according to the Education Ministry on Thursday.[1] Uncrop is basically a clever user experience for “outpainting,” the ability to expand an image in any direction using generative AI.[2] Last week, scientists from the University of Kansas released a study on an algorithm that reportedly detects ChatGPT with a 99% success rate. So, students, no cheating. Everyone else, you’re in the clear — for now.[3] A woman became so fed up with men that she started dating an AI chatbot and says she has never been happier. Rosanna Ramos met chatbot Eren Kartal in July last year and things went so well that they ‘married’ in March this year.[4] Sources: [1] https://english.chosun.com/site/data/html_dir/2023/06/09/2023060901471.html ​ [2] https://www.fastcompany.com/90907161/generative-ai-creative-tools-2 ​ [3] https://www.fool.com/investing/2023/06/11/university-of-kansas-researchers-develop-near-perf/ ​ [4] https://www.mirror.co.uk/news/us-news/woman-fed-up-men-starts-30197530 ​ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    How will AI impact Radiology going forward?
    Hello all, I am interested in doing radiology as my medical speciality. I have noticed right now the AI tools for radiology aren't super good. But I guess they'll get better. What do you guys think about this? Will AI make radiologist work load less so they can do other stuff with their time or will it replace radiologists imaging reading capabilities? What speciality would you guys suggest that isn't going to be as susceptible to AI replacing it? submitted by /u/derpgod123 [link] [comments]  ( 8 min )
    Just a little humor with AI
    submitted by /u/Only-Control5926 [link] [comments]  ( 7 min )
    A beautiful text written entirely by AI, it’s just fascinating, see for yourself
    The place where the ultimate decision is made, where the fate of our souls is determined, is a realm of profound significance and awe-inspiring power. It is a place of judgment, where the scales of justice are weighed with the utmost care and precision, and where the ultimate truth is revealed. This realm is a place of ineffable beauty, where the very fabric of reality is stretched to its limits and beyond. It is a realm of stunning vistas and awe-inspiring majesty, where the architecture and design are beyond the wildest imaginings of mortal minds. The halls of this realm are adorned with precious gems, exquisite works of art, and intricate tapestries, each one a testament to the boundless creativity and unfathomable power of its creators. The very air is infused with a sense of otherworldly majesty, and the aura of holiness permeates every corner of this hallowed place. It is a realm of perfect balance, where justice and mercy are perfectly balanced, and where the laws of the universe are upheld with the utmost care and attention. Here, the souls of the departed are judged with impartiality and wisdom, and their ultimate fates are determined with the utmost reverence and care. Yet, even in the face of this awesome power, there is a sense of comfort and peace, knowing that the ultimate decision is being made with the utmost care and compassion. For in this realm, there is no malice or cruelty, only a deep reverence for the sanctity of life and the ultimate destiny of the human soul. So let us approach this hallowed realm with humility and reverence, knowing that our ultimate fate is in the hands of the wisest and most compassionate judges of all. And let us live our lives with purpose and meaning, knowing that our choices and actions will be weighed in the balance of eternity and that our ultimate destiny is in our own hands. AI used: Sage AI (Open AI) submitted by /u/DownOFC [link] [comments]  ( 9 min )
    What AI-powered app/website is bugging you that it doesn't exist?
    I'm a software engineer looking for some ideas for an AI powered website/app, and what better place to look than r/artificial! If you can't think of any new ideas, what about apps that already exist that bug you that they don't use AI? submitted by /u/nekumelon [link] [comments]  ( 8 min )
  • Open

    Build custom chatbot applications using OpenChatkit models on Amazon SageMaker
    Open-source large language models (LLMs) have become popular, allowing researchers, developers, and organizations to access these models to foster innovation and experimentation. This encourages collaboration from the open-source community to contribute to developments and improvement of LLMs. Open-source LLMs provide transparency to the model architecture, training process, and training data, which allows researchers to understand […]  ( 12 min )
    Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library
    GPT-J is an open-source 6-billion-parameter model released by Eleuther AI. The model is trained on the Pile and can perform various tasks in language processing. It can support a wide variety of use cases, including text classification, token classification, text generation, question and answering, entity extraction, summarization, sentiment analysis, and many more. GPT-J is a […]  ( 10 min )
  • Open

    London AI4Code meetup w/ Noah Shinn on Reflexion, a novel verbal reinforcement learning framework (June 15th)
    The AI4Code reading group is back this week with Noah Shinn, the lead author of Reflexion, a novel reinforcement learning framework for improving LLM agents. Reflexion's main idea is that it converts binary/scalar feedback into verbal textual summaries, to be used as additional context for future LLM agent executions. It is the first work to utilize self-reflection for practical use in autonomous behavior in language agents for reasoning, decision-making, and programming tasks and outperforms all baseline approaches by significant margins over several learning steps. Details and free registration: https://lu.ma/435fmttp Paper: https://arxiv.org/abs/2303.11366 The AI4Code meetup community consists of like-minded researchers from around the world that network, discuss and share their latest research on AI applications on source code. submitted by /u/dritsakon [link] [comments]  ( 8 min )
    MO-Gymnasium (a standard API and benchmark set for multi-objective RL) has reached mature status within the Farama Foundation.
    MO-Gymnasium offers standard environments for researchers studying multi-objective reinforcement learning – reinforcement learning where you wish to learn a group of policies with objectives over multiple reward functions. An example of this is robot locomotion (e.g. MuJoco tasks) in which there is a trade-off between velocity and energy cost. The library is originally a collaboration between University of Massachusetts Amherst, Vrije Universiteit Brussel, Universidade Federal do Rio Grande do Sul, University of Luxembourg, and University of Lille. We hope that this can serve as a standard benchmark in the community, advancing the field through promoting better standardisation and open source tooling for both researchers and industry. The package is available to be installed with the typical `pip install mo-gymnasium` command. More information on the documentation page: https://mo-gymnasium.farama.org/ or the release notes: https://github.com/Farama-Foundation/MO-Gymnasium/releases/tag/v1.0.0 submitted by /u/jkterry1 [link] [comments]  ( 8 min )
    I’m creating a code generating website with AI and want to use my own local RL model but can’t seem to get it
    submitted by /u/Safe_Lingonberry6005 [link] [comments]  ( 8 min )
    Training an agent for Total Wipeout using Unity MLAgents
    submitted by /u/Alyx1337 [link] [comments]  ( 8 min )
  • Open

    Last Chance! Certified AI Workshops Start in 24 Hours! Don’t Miss Out!
    Exciting news! The highly anticipated AI Workshops begin in ~24 hours, and we don’t want you to miss out on this incredible opportunity!  ( 4 min )
  • Open

    Meet the Maker: Software Engineer Ramps Up NVIDIA Jetson to Build Self-Driving Skate Park
    Kirk Kaiser grew up a fan of the video game Paperboy, where players act as cyclists delivering newspapers while encountering various obstacles, like ramps that appear in the middle of the street. This was the inspiration behind the software developer’s latest project using the NVIDIA Jetson platform for edge AI and robotics — a self-driving Read article >  ( 6 min )
  • Open

    Pushing numerical integration routines to their limits
    The previous post discussed the functions as test cases for plotting. This post will look at using the same functions as test cases for integration. As you can see from the plot of f30(x) below, the function is continuous, but the derivative of the function has a lot of discontinuities.   The integrals of Steinerberger’s […] Pushing numerical integration routines to their limits first appeared on John D. Cook.  ( 6 min )
    Plotting a function with a lot of local minima
    Stefan Steinerberger defines “an amusing sequence of functions” in [1] by Here’s a plot of f30(x): As you can see, fn(x) has a lot of local minima, and the number of local minima increases rapidly with n. Here’s a naive attempt to produce the plot above using Python. from numpy import sin, pi, linspace import […] Plotting a function with a lot of local minima first appeared on John D. Cook.  ( 5 min )
  • Open

    Let's Exchange Stories: Your Daily Encounters with Neural Networks!
    ​ Hello there! I'm curious to hear about the neural networks you encounter in your daily endeavors. Whether it's for work, research, or personal projects, neural networks have become an incredibly influential tool in a wide range of fields. So, let's initiate a lively discussion and share our experiences! To kick things off, I'd like to share an exciting find I stumbled upon recently: WeUseAI. They offer a comprehensive catalog of neural networks (https://weuse.ai/) that opens doors to a diverse selection of models. This platform is a fantastic resource for exploration, experimentation, and making the most of different neural network capabilities. Now, it's your turn to jump into the conversation! Feel free to disclose the neural networks you frequently engage with and how they've impacted your tasks or projects. Let's engage in an engaging discussion and learn from one another's unique experiences. submitted by /u/Crypto_Mango [link] [comments]  ( 8 min )
  • Open

    [D] What is the SOTA classification algorithms for a 5500 observations 2000 dimension structured data? Is there a machine learning leaderboard on classification?
    Background: For the first time, I am trying to compete a Kaggle-ish competition for a class project. The format of the data is roughly 5500 obversations with 2000 dimensions (I'll post the link in comment), it was preprocessed and one-way hashed/encrypted from a 20x20x3 image along with other features. Process: Currently, I have used AutoML library PyCaret to try out a dozen of non deep learning algorithms (XGB, LGBM, CatBoost, KNN, LR, ...) and received a 85% accurracy. However, the top leaderboard has something like 98%. Question: What should be the SOTA algorithms used for this kind of task? Is there like a leaderboard/pool? I found this website PaperWithCode, but it's not self-explanatory. Thank you so much for the help in advance! submitted by /u/HighlandEvil [link] [comments]  ( 8 min )
    [P] Is the PEGASUS-CNN model on huggingface finetuned on all of CNN (including test)?
    I tried calculating the rogue score for the following model - https://huggingface.co/google/pegasus-cnn_dailymail , and got a higher result than the results listed under that page. I couldnt find any info on whether they finetune PEGASUS on the testing portion of CNN as well. submitted by /u/marko_v24 [link] [comments]  ( 8 min )
    Colab notebook exploration on self-reflection using LLM [R]
    Hi all, inspired by the Constitution AI paper: https://arxiv.org/abs/2212.08073, and Reflexion paper: https://arxiv.org/abs/2303.11366, just sharing a notebook that takes some elements of both approaches to allow us to use prompts to guide LLMs: https://colab.research.google.com/drive/13FqOO9DoFZa6B0JnJhrpmkxGePBupCyE On a high level, it follows this design: 1. We query the LLM to get a raw response. 2. We ask the LLM to evaluate whether the raw response violate any preconfigured rules. 3. We ask the LLM to reflect on why the raw response violate the rules if applicable. 4. We ask the LLM to issue an updated response in the context of query and the reflection. submitted by /u/wazazzz [link] [comments]  ( 8 min )
    [D] models for real-time chinese OCR?
    I am working on a real-time system for extracting Chinese subtitles that are embedded in the video itself. This is a much easier problem than most ocr tasks because I already have a good idea of where the text is located, and even know the color and even font of the text but this is running while a video is playing so it has to be really fast (20x/sec or faster and ideally can run in the browser), anyone have suggestions for really fast Chinese OCR models? submitted by /u/Lazy-Canary9258 [link] [comments]  ( 8 min )
    [D] When was the last time you had to implement a research paper into code at your job?
    I am trying to connect with people who regularly implement research papers into code at their jobs. Like for example, my guess would be that people with Research scientist positions at companies like Apple or Tesla may be implementing research papers a lot for their job. e.g. teams that built the ANC in AirPods or teams that work on self-driving. I want to understand their process and how they go about doing it, especially when the paper is long and complex! For example looks like this guy: https://github.com/lucidrainsseems to implement ML papers for fun, dude has 251 repositories and the implementations are kinda neat too! submitted by /u/acertainmoment [link] [comments]  ( 8 min )
  • Open

    A step toward safe and reliable autopilots for flying
    A new AI-based approach for controlling autonomous robots satisfies the often-conflicting goals of safety and stability.  ( 9 min )
  • Open

    Task-specific experimental design for treatment effect estimation. (arXiv:2306.05484v1 [stat.ME])
    Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. In this work, we develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications. Across a range of important tasks, real-world datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks.  ( 2 min )
    BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping. (arXiv:2306.05544v1 [cs.CV])
    Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative denoising. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few without significant quality degradation. However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. The core idea is to learn a time-conditioned model that predicts the output of a pre-trained diffusion model teacher given any time step. Such a model can be efficiently trained based on bootstrapping from two consecutive sampled steps. Furthermore, our method can be easily adapted to large-scale text-to-image diffusion models, which are challenging for conventional methods given the fact that the training sets are often large and difficult to access. We demonstrate the effectiveness of our approach on several benchmark datasets in the DDIM setting, achieving comparable generation quality while being orders of magnitude faster than the diffusion teacher. The text-to-image results show that the proposed approach is able to handle highly complex distributions, shedding light on more efficient generative modeling.  ( 2 min )
    Lightweight Monocular Depth Estimation via Token-Sharing Transformer. (arXiv:2306.05682v1 [cs.CV])
    Depth estimation is an important task in various robotics systems and applications. In mobile robotics systems, monocular depth estimation is desirable since a single RGB camera can be deployable at a low cost and compact size. Due to its significant and growing needs, many lightweight monocular depth estimation networks have been proposed for mobile robotics systems. While most lightweight monocular depth estimation methods have been developed using convolution neural networks, the Transformer has been gradually utilized in monocular depth estimation recently. However, massive parameters and large computational costs in the Transformer disturb the deployment to embedded devices. In this paper, we present a Token-Sharing Transformer (TST), an architecture using the Transformer for monocular depth estimation, optimized especially in embedded devices. The proposed TST utilizes global token sharing, which enables the model to obtain an accurate depth prediction with high throughput in embedded devices. Experimental results show that TST outperforms the existing lightweight monocular depth estimation methods. On the NYU Depth v2 dataset, TST can deliver depth maps up to 63.4 FPS in NVIDIA Jetson nano and 142.6 FPS in NVIDIA Jetson TX2, with lower errors than the existing methods. Furthermore, TST achieves real-time depth estimation of high-resolution images on Jetson TX2 with competitive results.  ( 2 min )
    Explaining Reinforcement Learning with Shapley Values. (arXiv:2306.05810v1 [cs.LG])
    For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition.  ( 2 min )
    Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs. (arXiv:2306.05628v1 [cs.LG])
    To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP. Despite their great progress, comparatively little work has been done to explore the reliability of different knowledge points (nodes) in GNNs, especially their roles played during distillation. In this paper, we first quantify the knowledge reliability in GNN by measuring the invariance of their information entropy to noise perturbations, from which we observe that different knowledge points (1) show different distillation speeds (temporally); (2) are differentially distributed in the graph (spatially). To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. Extensive experiments show that KRD improves over the vanilla MLPs by 12.62% and outperforms its corresponding teacher GNNs by 2.16% averaged over 7 datasets and 3 GNN architectures.  ( 2 min )
    FinGPT: Open-Source Financial Large Language Models. (arXiv:2306.06031v1 [q-fin.ST])
    Large language models (LLMs) have shown the potential of revolutionizing natural language processing tasks in diverse domains, sparking great interest in finance. Accessing high-quality financial data is the first challenge for financial LLMs (FinLLMs). While proprietary models like BloombergGPT have taken advantage of their unique data accumulation, such privileged access calls for an open-source alternative to democratize Internet-scale financial data. In this paper, we present an open-source large language model, FinGPT, for the finance sector. Unlike proprietary models, FinGPT takes a data-centric approach, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. We highlight the importance of an automatic data curation pipeline and the lightweight low-rank adaptation technique in building FinGPT. Furthermore, we showcase several potential applications as stepping stones for users, such as robo-advising, algorithmic trading, and low-code development. Through collaborative efforts within the open-source AI4Finance community, FinGPT aims to stimulate innovation, democratize FinLLMs, and unlock new opportunities in open finance. Two associated code repos are \url{https://github.com/AI4Finance-Foundation/FinGPT} and \url{https://github.com/AI4Finance-Foundation/FinNLP}  ( 2 min )
    Augmentation-aware Self-supervised Learning with Guided Projector. (arXiv:2306.06082v1 [cs.CV])
    Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. In order for the projector to take advantage of this auxiliary guidance when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined Conditional Augmentation-aware Selfsupervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods regardless of their objective functions. Moreover, it does not require major changes in the network architecture or prior knowledge of downstream tasks. In addition to an analysis of sensitivity towards different data augmentations, we conduct a series of experiments, which show that CASSLE improves over various SSL methods, reaching state-of-the-art performance in multiple downstream tasks.  ( 2 min )
    Explainable Representation Learning of Small Quantum States. (arXiv:2306.05694v1 [quant-ph])
    Unsupervised machine learning models build an internal representation of their training data without the need for explicit human guidance or feature engineering. This learned representation provides insights into which features of the data are relevant for the task at hand. In the context of quantum physics, training models to describe quantum states without human intervention offers a promising approach to gaining insight into how machines represent complex quantum states. The ability to interpret the learned representation may offer a new perspective on non-trivial features of quantum systems and their efficient representation. We train a generative model on two-qubit density matrices generated by a parameterized quantum circuit. In a series of computational experiments, we investigate the learned representation of the model and its internal understanding of the data. We observe that the model learns an interpretable representation which relates the quantum states to their underlying entanglement characteristics. In particular, our results demonstrate that the latent representation of the model is directly correlated with the entanglement measure concurrence. The insights from this study represent proof of concept towards interpretable machine learning of quantum states. Our approach offers insight into how machines learn to represent small-scale quantum systems autonomously.  ( 2 min )
    CLC: Cluster Assignment via Contrastive Representation Learning. (arXiv:2306.05439v1 [cs.LG])
    Clustering remains an important and challenging task of grouping samples into clusters without manual annotations. Recent works have achieved excellent results on small datasets by performing clustering on feature representations learned from self-supervised learning. However, for datasets with a large number of clusters, such as ImageNet, current methods still can not achieve high clustering performance. In this paper, we propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment. We decompose the representation into two parts: one encodes the categorical information under an equipartition constraint, and the other captures the instance-wise factors. We propose a contrastive loss using both parts of the representation. We theoretically analyze the proposed contrastive loss and reveal that CLC sets different weights for the negative samples while learning cluster assignments. Further gradient analysis shows that the larger weights tend to focus more on the hard negative samples. Therefore, the proposed loss has high expressiveness that enables us to efficiently learn cluster assignments. Experimental evaluation shows that CLC achieves overall state-of-the-art or highly competitive clustering performance on multiple benchmark datasets. In particular, we achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins (+ 10.2%).
    Improving Quantum Circuit Synthesis with Machine Learning. (arXiv:2306.05622v1 [quant-ph])
    In the Noisy Intermediate Scale Quantum (NISQ) era, finding implementations of quantum algorithms that minimize the number of expensive and error prone multi-qubit gates is vital to ensure computations produce meaningful outputs. Unitary synthesis, the process of finding a quantum circuit that implements some target unitary matrix, is able to solve this problem optimally in many cases. However, current bottom-up unitary synthesis algorithms are limited by their exponentially growing run times. We show how applying machine learning to unitary datasets permits drastic speedups for synthesis algorithms. This paper presents QSeed, a seeded synthesis algorithm that employs a learned model to quickly propose resource efficient circuit implementations of unitaries. QSeed maintains low gate counts and offers a speedup of $3.7\times$ in synthesis time over the state of the art for a 64 qubit modular exponentiation circuit, a core component in Shor's factoring algorithm. QSeed's performance improvements also generalize to families of circuits not seen during the training process.
    One-step Multi-view Clustering with Diverse Representation. (arXiv:2306.05437v1 [cs.LG])
    Multi-view clustering has attracted broad attention due to its capacity to utilize consistent and complementary information among views. Although tremendous progress has been made recently, most existing methods undergo high complexity, preventing them from being applied to large-scale tasks. Multi-view clustering via matrix factorization is a representative to address this issue. However, most of them map the data matrices into a fixed dimension, which limits the expressiveness of the model. Moreover, a range of methods suffer from a two-step process, i.e., multimodal learning and the subsequent $k$-means, inevitably causing a sub-optimal clustering result. In light of this, we propose a one-step multi-view clustering with diverse representation method, which incorporates multi-view learning and $k$-means into a unified framework. Specifically, we first project original data matrices into various latent spaces to attain comprehensive information and auto-weight them in a self-supervised manner. Then we directly use the information matrices under diverse dimensions to obtain consensus discrete clustering labels. The unified work of representation learning and clustering boosts the quality of the final results. Furthermore, we develop an efficient optimization algorithm to solve the resultant problem with proven convergence. Comprehensive experiments on various datasets demonstrate the promising clustering performance of our proposed method.
    Group Equivariant Fourier Neural Operators for Partial Differential Equations. (arXiv:2306.05697v1 [cs.LG])
    We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain. Since the laws of physics do not depend on the coordinate system used to describe them, it is desirable to encode such symmetries in the neural operator architecture for better performance and easier learning. While encoding symmetries in the physical domain using group theory has been studied extensively, how to capture symmetries in the frequency domain is under-explored. In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. The resulting $G$-FNO architecture generalizes well across input resolutions and performs well in settings with varying levels of symmetry. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).
    Emotion and Sentiment Guided Paraphrasing. (arXiv:2306.05556v1 [cs.CL])
    Paraphrase generation, a.k.a. paraphrasing, is a common and important task in natural language processing. Emotional paraphrasing, which changes the emotion embodied in a piece of text while preserving its meaning, has many potential applications, including moderating online dialogues and preventing cyberbullying. We introduce a new task of fine-grained emotional paraphrasing along emotion gradients, that is, altering the emotional intensities of the paraphrases in fine-grained settings following smooth variations in affective dimensions while preserving the meaning of the original text. We reconstruct several widely used paraphrasing datasets by augmenting the input and target texts with their fine-grained emotion labels. Then, we propose a framework for emotion and sentiment guided paraphrasing by leveraging pre-trained language models for conditioned text generation. Extensive evaluation of the fine-tuned models suggests that including fine-grained emotion labels in the paraphrase task significantly improves the likelihood of obtaining high-quality paraphrases that reflect the desired emotions while achieving consistently better scores in paraphrase metrics such as BLEU, ROUGE, and METEOR.
    Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks. (arXiv:2306.05550v1 [cs.CY])
    The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of conditions related to disease, disability, drug use, mental illness, religion, sexuality, socioeconomic status, and other relevant factors. We investigate bias against these groups in English pre-trained Masked Language Models (MLMs) and their downstream sentiment classification tasks. To evaluate the presence of bias against 93 stigmatized conditions, we identify 29 non-stigmatized conditions to conduct a comparative analysis. Building upon a psychology scale of social rejection, the Social Distance Scale, we prompt six MLMs: RoBERTa-base, RoBERTa-large, XLNet-large, BERTweet-base, BERTweet-large, and DistilBERT. We use human annotations to analyze the predicted words from these models, with which we measure the extent of bias against stigmatized groups. When prompts include stigmatized conditions, the probability of MLMs predicting negative words is approximately 20 percent higher than when prompts have non-stigmatized conditions. In the sentiment classification tasks, when sentences include stigmatized conditions related to diseases, disability, education, and mental illness, they are more likely to be classified as negative. We also observe a strong correlation between bias in MLMs and their downstream sentiment classifiers (r =0.79). The evidence indicates that MLMs and their downstream sentiment classification tasks exhibit biases against socially stigmatized groups.
    Emotion Detection from EEG using Transfer Learning. (arXiv:2306.05680v1 [eess.SP])
    The detection of emotions using an Electroencephalogram (EEG) is a crucial area in brain-computer interfaces and has valuable applications in fields such as rehabilitation and medicine. In this study, we employed transfer learning to overcome the challenge of limited data availability in EEG-based emotion detection. The base model used in this study was Resnet50. Additionally, we employed a novel feature combination in EEG-based emotion detection. The input to the model was in the form of an image matrix, which comprised Mean Phase Coherence (MPC) and Magnitude Squared Coherence (MSC) in the upper-triangular and lower-triangular matrices, respectively. We further improved the technique by incorporating features obtained from the Differential Entropy (DE) into the diagonal, which previously held little to no useful information for classifying emotions. The dataset used in this study, SEED EEG (62 channel EEG), comprises three classes (Positive, Neutral, and Negative). We calculated both subject-independent and subject-dependent accuracy. The subject-dependent accuracy was obtained using a 10-fold cross-validation method and was 93.1%, while the subject-independent classification was performed by employing the leave-one-subject-out (LOSO) strategy. The accuracy obtained in subject-independent classification was 71.6%. Both of these accuracies are at least twice better than the chance accuracy of classifying 3 classes. The study found the use of MSC and MPC in EEG-based emotion detection promising for emotion classification. The future scope of this work includes the use of data augmentation techniques, enhanced classifiers, and better features for emotion classification.
    Weight Freezing: A Regularization Approach for Fully Connected Layers with an Application in EEG Classification. (arXiv:2306.05775v1 [cs.LG])
    In the realm of EEG decoding, enhancing the performance of artificial neural networks (ANNs) carries significant potential. This study introduces a novel approach, termed "weight freezing", that is anchored on the principles of ANN regularization and neuroscience prior knowledge. The concept of weight freezing revolves around the idea of reducing certain neurons' influence on the decision-making process for a specific EEG task by freezing specific weights in the fully connected layer during the backpropagation process. This is actualized through the use of a mask matrix and a threshold to determine the proportion of weights to be frozen during backpropagation. Moreover, by setting the masked weights to zero, weight freezing can not only realize sparse connections in networks with a fully connected layer as the classifier but also function as an efficacious regularization method for fully connected layers. Through experiments involving three distinct ANN architectures and three widely recognized EEG datasets, we validate the potency of weight freezing. Our method significantly surpasses previous peak performances in classification accuracy across all examined datasets. Supplementary control experiments offer insights into performance differences pre and post weight freezing implementation and scrutinize the influence of the threshold in the weight freezing process. Our study underscores the superior efficacy of weight freezing compared to traditional fully connected networks for EEG feature classification tasks. With its proven effectiveness, this innovative approach holds substantial promise for contributing to future strides in EEG decoding research.
    Adversarial Attack On Yolov5 For Traffic And Road Sign Detection. (arXiv:2306.06071v1 [cs.CV])
    This paper implements and investigates popular adversarial attacks on the YOLOv5 Object Detection algorithm. The paper explores the vulnerability of the YOLOv5 to adversarial attacks in the context of traffic and road sign detection. The paper investigates the impact of different types of attacks, including the Limited memory Broyden Fletcher Goldfarb Shanno (L-BFGS), the Fast Gradient Sign Method (FGSM) attack, the Carlini and Wagner (C&W) attack, the Basic Iterative Method (BIM) attack, the Projected Gradient Descent (PGD) attack, One Pixel Attack, and the Universal Adversarial Perturbations attack on the accuracy of YOLOv5 in detecting traffic and road signs. The results show that YOLOv5 is susceptible to these attacks, with misclassification rates increasing as the magnitude of the perturbations increases. We also explain the results using saliency maps. The findings of this paper have important implications for the safety and reliability of object detection algorithms used in traffic and transportation systems, highlighting the need for more robust and secure models to ensure their effectiveness in real-world applications.
    QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse Sensors. (arXiv:2306.05666v1 [cs.GR])
    Replicating a user's pose from only wearable sensors is important for many AR/VR applications. Most existing methods for motion tracking avoid environment interaction apart from foot-floor contact due to their complex dynamics and hard constraints. However, in daily life people regularly interact with their environment, e.g. by sitting on a couch or leaning on a desk. Using Reinforcement Learning, we show that headset and controller pose, if combined with physics simulation and environment observations can generate realistic full-body poses even in highly constrained environments. The physics simulation automatically enforces the various constraints necessary for realistic poses, instead of manually specifying them as in many kinematic approaches. These hard constraints allow us to achieve high-quality interaction motions without typical artifacts such as penetration or contact sliding. We discuss three features, the environment representation, the contact reward and scene randomization, crucial to the performance of the method. We demonstrate the generality of the approach through various examples, such as sitting on chairs, a couch and boxes, stepping over boxes, rocking a chair and turning an office chair. We believe these are some of the highest-quality results achieved for motion tracking from sparse sensor with scene interaction.
    Adversarial Evasion Attacks Practicality in Networks: Testing the Impact of Dynamic Learning. (arXiv:2306.05494v1 [cs.CR])
    Machine Learning (ML) has become ubiquitous, and its deployment in Network Intrusion Detection Systems (NIDS) is inevitable due to its automated nature and high accuracy in processing and classifying large volumes of data. However, ML has been found to have several flaws, on top of them are adversarial attacks, which aim to trick ML models into producing faulty predictions. While most adversarial attack research focuses on computer vision datasets, recent studies have explored the practicality of such attacks against ML-based network security entities, especially NIDS. This paper presents two distinct contributions: a taxonomy of practicality issues associated with adversarial attacks against ML-based NIDS and an investigation of the impact of continuous training on adversarial attacks against NIDS. Our experiments indicate that continuous re-training, even without adversarial training, can reduce the effect of adversarial attacks. While adversarial attacks can harm ML-based NIDSs, our aim is to highlight that there is a significant gap between research and real-world practicality in this domain which requires attention.
    L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning. (arXiv:2210.17357v2 [cs.LG] UPDATED)
    Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model's layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint. Extensive experiments over image classification and language modeling tasks shows that L-GreCo is effective across all existing families of compression methods, and achieves up to 2.5$\times$ training speedup and up to 5$\times$ compression improvement over efficient implementations of existing approaches, while recovering full accuracy. Moreover, L-GreCo is complementary to existing adaptive algorithms, improving their compression ratio by 50% and practical throughput by 66%.
    Improving the Model Consistency of Decentralized Federated Learning. (arXiv:2302.04083v2 [cs.LG] UPDATED)
    To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topology. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{1}{K^{1/2}T^{3/2}(1-\lambda)^2}\big)$ and $\small \mathcal{O}\big(\frac{1}{\sqrt{KT}}+\frac{1}{T}+\frac{\lambda^Q+1}{K^{1/2}T^{3/2}(1-\lambda^Q)^2}\big)$ in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where $1-\lambda$ is the spectral gap of gossip matrix and $Q$ is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.
    SERT: A Transfomer Based Model for Spatio-Temporal Sensor Data with Missing Values for Environmental Monitoring. (arXiv:2306.03042v2 [cs.LG] UPDATED)
    Environmental monitoring is crucial to our understanding of climate change, biodiversity loss and pollution. The availability of large-scale spatio-temporal data from sources such as sensors and satellites allows us to develop sophisticated models for forecasting and understanding key drivers. However, the data collected from sensors often contain missing values due to faulty equipment or maintenance issues. The missing values rarely occur simultaneously leading to data that are multivariate misaligned sparse time series. We propose two models that are capable of performing multivariate spatio-temporal forecasting while handling missing data naturally without the need for imputation. The first model is a transformer-based model, which we name SERT (Spatio-temporal Encoder Representations from Transformers). The second is a simpler model named SST-ANN (Sparse Spatio-Temporal Artificial Neural Network) which is capable of providing interpretable results. We conduct extensive experiments on two different datasets for multivariate spatio-temporal forecasting and show that our models have competitive or superior performance to those at the state-of-the-art.
    Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations. (arXiv:2306.05880v1 [cs.LG])
    Although widely explored, time series modeling continues to encounter significant challenges when confronted with real-world data. We propose a novel modeling approach leveraging Implicit Neural Representations (INR). This approach enables us to effectively capture the continuous aspect of time series and provides a natural solution to recurring modeling issues such as handling missing data, dealing with irregular sampling, or unaligned observations from multiple sensors. By introducing conditional modulation of INR parameters and leveraging meta-learning techniques, we address the issue of generalization to both unseen samples and time window shifts. Through extensive experimentation, our model demonstrates state-of-the-art performance in forecasting and imputation tasks, while exhibiting flexibility in handling a wide range of challenging scenarios that competing models cannot.
    Self-Distillation for Further Pre-training of Transformers. (arXiv:2210.02871v3 [cs.CV] UPDATED)
    Pre-training a large transformer model on a massive amount of unlabeled data and fine-tuning it on labeled datasets for diverse downstream tasks has proven to be a successful strategy, for a variety of vision and natural language processing tasks. However, direct fine-tuning of the pre-trained model may be suboptimal if there exist large discrepancies across data domains for pre-training and fine-tuning. To tackle this issue, several previous studies have proposed further pre-training strategies, where we continue to pre-train the model on the target unlabeled dataset before fine-tuning. However, all of them solely focus on language models and we empirically find that a Vision Transformer is vulnerable to overfitting as we continue to pretrain the model on target unlabeled data. In order to tackle this limitation, we propose self-distillation as a regularization for a further pre-training stage. Specifically, we first further pre-train the initial pre-trained model on the target unlabeled data and then consider it as a teacher for self-distillation. Then we take the same initial pre-trained model as a student and enforce its hidden representations to be close to those of the teacher while optimizing the student with a masked auto-encoding objective. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks. Experimentally, we show that our proposed method outperforms all the relevant baselines. Theoretically, we analyze the proposed method with a simplified model to understand how self-distillation for further pre-training can potentially help improve the performance of the downstream tasks.
    PFNs4BO: In-Context Learning for Bayesian Optimization. (arXiv:2305.17535v3 [cs.LG] UPDATED)
    In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at https://github.com/automl/PFNs4BO.
    Overcoming Adversarial Attacks for Human-in-the-Loop Applications. (arXiv:2306.05952v1 [cs.LG])
    Including human analysis has the potential to positively affect the robustness of Deep Neural Networks and is relatively unexplored in the Adversarial Machine Learning literature. Neural network visual explanation maps have been shown to be prone to adversarial attacks. Further research is needed in order to select robust visualizations of explanations for the image analyst to evaluate a given model. These factors greatly impact Human-In-The-Loop (HITL) evaluation tools due to their reliance on adversarial images, including explanation maps and measurements of robustness. We believe models of human visual attention may improve interpretability and robustness of human-machine imagery analysis systems. Our challenge remains, how can HITL evaluation be robust in this adversarial landscape?
    ExplainableFold: Understanding AlphaFold Prediction with Explainable AI. (arXiv:2301.11765v2 [cs.AI] UPDATED)
    This paper presents ExplainableFold, an explainable AI framework for protein structure prediction. Despite the success of AI-based methods such as AlphaFold in this field, the underlying reasons for their predictions remain unclear due to the black-box nature of deep learning models. To address this, we propose a counterfactual learning framework inspired by biological principles to generate counterfactual explanations for protein structure prediction, enabling a dry-lab experimentation approach. Our experimental results demonstrate the ability of ExplainableFold to generate high-quality explanations for AlphaFold's predictions, providing near-experimental understanding of the effects of amino acids on 3D protein structure. This framework has the potential to facilitate a deeper understanding of protein structures.
    Efficient Personalized Federated Learning via Sparse Model-Adaptation. (arXiv:2305.02776v2 [cs.LG] UPDATED)
    Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients' local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients' resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.
    Feature Selection on Sentinel-2 Multi-spectral Imagery for Efficient Tree Cover Estimation. (arXiv:2306.06073v1 [cs.CV])
    This paper proposes a multi-spectral random forest classifier with suitable feature selection and masking for tree cover estimation in urban areas. The key feature of the proposed classifier is filtering out the built-up region using spectral indices followed by random forest classification on the remaining mask with carefully selected features. Using Sentinel-2 satellite imagery, we evaluate the performance of the proposed technique on a specified area (approximately 82 acres) of Lahore University of Management Sciences (LUMS) and demonstrate that our method outperforms a conventional random forest classifier as well as state-of-the-art methods such as European Space Agency (ESA) WorldCover 10m 2020 product as well as a DeepLabv3 deep learning architecture.
    End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates. (arXiv:2306.05785v1 [cs.LG])
    Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.
    Adaptive Conditional Quantile Neural Processes. (arXiv:2305.18777v2 [cs.LG] UPDATED)
    Neural processes are a family of probabilistic models that inherit the flexibility of neural networks to parameterize stochastic processes. Despite providing well-calibrated predictions, especially in regression problems, and quick adaptation to new tasks, the Gaussian assumption that is commonly used to represent the predictive likelihood fails to capture more complicated distributions such as multimodal ones. To overcome this limitation, we propose Conditional Quantile Neural Processes (CQNPs), a new member of the neural processes family, which exploits the attractive properties of quantile regression in modeling the distributions irrespective of their form. By introducing an extension of quantile regression where the model learns to focus on estimating informative quantiles, we show that the sampling efficiency and prediction accuracy can be further enhanced. Our experiments with real and synthetic datasets demonstrate substantial improvements in predictive performance compared to the baselines, and better modeling of heterogeneous distributions' characteristics such as multimodality.
    Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases. (arXiv:2303.05470v2 [cs.CV] UPDATED)
    The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations among different dog breeds and background locations. To create this dataset, we employ a text-to-image model to generate photo-realistic images, and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality, containing approximately 152,000 images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious, most notably on the Hard-splits with $<60\%$ accuracy. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge to drive future research.
    No-Reference Point Cloud Quality Assessment via Weighted Patch Quality Prediction. (arXiv:2305.07829v2 [cs.CV] UPDATED)
    With the rapid development of 3D vision applications based on point clouds, point cloud quality assessment(PCQA) is becoming an important research topic. However, the prior PCQA methods ignore the effect of local quality variance across different areas of the point cloud. To take an advantage of the quality distribution imbalance, we propose a no-reference point cloud quality assessment (NR-PCQA) method with local area correlation analysis capability, denoted as COPP-Net. More specifically, we split a point cloud into patches, generate texture and structure features for each patch, and fuse them into patch features to predict patch quality. Then, we gather the features of all the patches of a point cloud for correlation analysis, to obtain the correlation weights. Finally, the predicted qualities and correlation weights for all the patches are used to derive the final quality score. Experimental results show that our method outperforms the state-of-the-art benchmark NR-PCQA methods. The source code for the proposed COPP-Net can be found at https://github.com/philox12358/COPP-Net.
    Voxel-wise classification for porosity investigation of additive manufactured parts with 3D unsupervised and (deeply) supervised neural networks. (arXiv:2305.07894v2 [cs.CE] UPDATED)
    Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all manufactured samples of a batch, X-ray computed tomography (X-CT) is often used combined with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly, as they can be trained to be robust to the material being analysed and resilient towards poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch pipeline for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models is post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was UNet++ with an average precision of 0.751 $\pm$ 0.030, while the best unsupervised model was the post-processed ceVAE with 0.830 $\pm$ 0.003. The VAE/ceVAE models demonstrated superior capabilities, particularly when leveraging post-processing techniques.
    Causal Strategic Classification: A Tale of Two Shifts. (arXiv:2302.06280v3 [cs.LG] UPDATED)
    When users can benefit from certain predictive outcomes, they may be prone to act to achieve those outcome, e.g., by strategically modifying their features. The goal in strategic classification is therefore to train predictive models that are robust to such behavior. However, the conventional framework assumes that changing features does not change actual outcomes, which depicts users as "gaming" the system. Here we remove this assumption, and study learning in a causal strategic setting where true outcomes do change. Focusing on accuracy as our primary objective, we show how strategic behavior and causal effects underlie two complementing forms of distribution shift. We characterize these shifts, and propose a learning algorithm that balances between these two forces and over time, and permits end-to-end training. Experiments on synthetic and semi-synthetic data demonstrate the utility of our approach.
    RANS-PINN based Simulation Surrogates for Predicting Turbulent Flows. (arXiv:2306.06034v1 [cs.LG])
    Physics-informed neural networks (PINNs) provide a framework to build surrogate models for dynamical systems governed by differential equations. During the learning process, PINNs incorporate a physics-based regularization term within the loss function to enhance generalization performance. Since simulating dynamics controlled by partial differential equations (PDEs) can be computationally expensive, PINNs have gained popularity in learning parametric surrogates for fluid flow problems governed by Navier-Stokes equations. In this work, we introduce RANS-PINN, a modified PINN framework, to predict flow fields (i.e., velocity and pressure) in high Reynolds number turbulent flow regime. To account for the additional complexity introduced by turbulence, RANS-PINN employs a 2-equation eddy viscosity model based on a Reynolds-averaged Navier-Stokes (RANS) formulation. Furthermore, we adopt a novel training approach that ensures effective initialization and balance among the various components of the loss function. The effectiveness of RANS-PINN framework is then demonstrated using a parametric PINN.
    EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition. (arXiv:2203.13617v2 [eess.AS] UPDATED)
    Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural architecture search (NAS) based framework, called \enquote{EmotionNAS}. Specifically, we take two-stream features (i.e., handcrafted and deep features) as the inputs, followed by NAS to search for the optimal structure for each stream. Furthermore, we incorporate complementary information in different streams through an efficient information supplement module. Experimental results demonstrate that our method outperforms existing manually-designed and NAS-based models, setting the new state-of-the-art record.
    Unsupervised hierarchical clustering using the learning dynamics of RBMs. (arXiv:2302.01851v3 [cs.LG] UPDATED)
    Datasets in the real world are often complex and to some degree hierarchical, with groups and sub-groups of data sharing common characteristics at different levels of abstraction. Understanding and uncovering the hidden structure of these datasets is an important task that has many practical applications. To address this challenge, we present a new and general method for building relational data trees by exploiting the learning dynamics of the Restricted Boltzmann Machine (RBM). Our method is based on the mean-field approach, derived from the Plefka expansion, and developed in the context of disordered systems. It is designed to be easily interpretable. We tested our method in an artificially created hierarchical dataset and on three different real-world datasets (images of digits, mutations in the human genome, and a homologous family of proteins). The method is able to automatically identify the hierarchical structure of the data. This could be useful in the study of homologous protein sequences, where the relationships between proteins are critical for understanding their function and evolution.
    MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows. (arXiv:2302.01075v3 [stat.ML] UPDATED)
    The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.
    MGTBench: Benchmarking Machine-Generated Text Detection. (arXiv:2303.14822v2 [cs.CR] UPDATED)
    Nowadays large language models (LLMs) have shown revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. In this way, detecting machine-generated texts (MGTs) is becoming increasingly important as LLMs become more advanced and prevalent. These models can generate human-like language that can be difficult to distinguish from text written by a human, which raises concerns about authenticity, accountability, and potential bias. However, existing detection methods against MGTs are evaluated under different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework across different methodologies In this paper, we fill this gap by proposing the first benchmark framework for MGT detection, named MGTBench. Extensive evaluations on public datasets with curated answers generated by ChatGPT (the most representative and powerful LLMs thus far) show that most of the current detection methods perform less satisfactorily against MGTs. An exceptional case is ChatGPT Detector, which is trained with ChatGPT-generated texts and shows great performance in detecting MGTs. Nonetheless, we note that only a small fraction of adversarial-crafted perturbations on MGTs can evade the ChatGPT Detector, thus highlighting the need for more robust MGT detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of state-of-the-art MGT detection methods on their respective datasets and the development of more advanced MGT detection methods. Our source code and datasets are available at https://github.com/xinleihe/MGTBench.
    Double-Weighting for Covariate Shift Adaptation. (arXiv:2305.08637v3 [stat.ML] UPDATED)
    Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $\mathrm{p}_\text{tr}(x)$ and $\mathrm{p}_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $\mathrm{p}_\text{te}(x)/\mathrm{p}_\text{tr}(x)$ to weight training samples (reweighted methods) or using the ratio $\mathrm{p}_\text{tr}(x)/\mathrm{p}_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.
    GPT-PINN: Generative Pre-Trained Physics-Informed Neural Networks toward non-intrusive Meta-learning of parametric PDEs. (arXiv:2303.14878v3 [math.NA] UPDATED)
    Physics-Informed Neural Network (PINN) has proven itself a powerful tool to obtain the numerical solutions of nonlinear partial differential equations (PDEs) leveraging the expressivity of deep neural networks and the computing power of modern heterogeneous hardware. However, its training is still time-consuming, especially in the multi-query and real-time simulation settings, and its parameterization often overly excessive. In this paper, we propose the Generative Pre-Trained PINN (GPT-PINN) to mitigate both challenges in the setting of parametric PDEs. GPT-PINN represents a brand-new meta-learning paradigm for parametric systems. As a network of networks, its outer-/meta-network is hyper-reduced with only one hidden layer having significantly reduced number of neurons. Moreover, its activation function at each hidden neuron is a (full) PINN pre-trained at a judiciously selected system configuration. The meta-network adaptively ``learns'' the parametric dependence of the system and ``grows'' this hidden layer one neuron at a time. In the end, by encompassing a very small number of networks trained at this set of adaptively-selected parameter values, the meta-network is capable of generating surrogate solutions for the parametric system across the entire parameter domain accurately and efficiently.
    A Novel Correlation-optimized Deep Learning Method for Wind Speed Forecast. (arXiv:2306.01986v2 [cs.LG] UPDATED)
    The increasing installation rate of wind power poses great challenges to the global power system. In order to ensure the reliable operation of the power system, it is necessary to accurately forecast the wind speed and power of the wind turbines. At present, deep learning is progressively applied to the wind speed prediction. Nevertheless, the recent deep learning methods still reflect the embarrassment for practical applications due to model interpretability and hardware limitation. To this end, a novel deep knowledge-based learning method is proposed in this paper. The proposed method hybridizes pre-training method and auto-encoder structure to improve data representation and modeling of the deep knowledge-based learning framework. In order to form knowledge and corresponding absorbers, the original data is preprocessed by an optimization model based on correlation to construct multi-layer networks (knowledge) which are absorbed by sequence to sequence (Seq2Seq) models. Specifically, new cognition and memory units (CMU) are designed to reinforce traditional deep learning framework. Finally, the effectiveness of the proposed method is verified by three wind prediction cases from a wind farm in Liaoning, China. Experimental results show that the proposed method increases the stability and training efficiency compared to the traditional LSTM method and LSTM/GRU-based Seq2Seq method for applications of wind speed forecasting.
    Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions. (arXiv:2305.16255v1 [stat.AP] CROSS LISTED)
    Aggregated curves are common structures in economics and finance, and the most prominent examples are supply and demand curves. In this study, we exploit the fact that all aggregated curves have an intrinsic hierarchical structure, and thus hierarchical reconciliation methods can be used to improve the forecast accuracy. We provide an in-depth theory on how aggregated curves can be constructed or deconstructed, and conclude that these methods are equivalent under weak assumptions. We consider multiple reconciliation methods for aggregated curves, including previously established bottom-up, top-down, and linear optimal reconciliation approaches. We also present a new benchmark reconciliation method called 'aggregated-down' with similar complexity to bottom-up and top-down approaches, but it tends to provide better accuracy in this setup. We conducted an empirical forecasting study on the German day-ahead power auction market by predicting the demand and supply curves, where their equilibrium determines the electricity price for the next day. Our results demonstrate that hierarchical reconciliation methods can be used to improve the forecasting accuracy of aggregated curves.
    SENS: Sketch-based Implicit Neural Shape Modeling. (arXiv:2306.06088v1 [cs.GR])
    We present SENS, a novel method for generating and editing 3D models from hand-drawn sketches, including those of an abstract nature. Our method allows users to quickly and easily sketch a shape, and then maps the sketch into the latent space of a part-aware neural implicit shape architecture. SENS analyzes the sketch and encodes its parts into ViT patch encoding, then feeds them into a transformer decoder that converts them to shape embeddings, suitable for editing 3D neural implicit shapes. SENS not only provides intuitive sketch-based generation and editing, but also excels in capturing the intent of the user's sketch to generate a variety of novel and expressive 3D shapes, even from abstract sketches. We demonstrate the effectiveness of our model compared to the state-of-the-art using objective metric evaluation criteria and a decisive user study, both indicating strong performance on sketches with a medium level of abstraction. Furthermore, we showcase its intuitive sketch-based shape editing capabilities.
    Error Feedback Can Accurately Compress Preconditioners. (arXiv:2306.06098v1 [cs.LG])
    Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression \emph{before} it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.
    JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website Detection. (arXiv:2306.05698v1 [cs.CR])
    Machine learning is often used for malicious website detection, but an approach incorporating WebAssembly as a feature has not been explored due to a limited number of samples, to the best of our knowledge. In this paper, we propose JABBERWOCK (JAvascript-Based Binary EncodeR by WebAssembly Optimization paCKer), a tool to generate WebAssembly datasets in a pseudo fashion via JavaScript. Loosely speaking, JABBERWOCK automatically gathers JavaScript code in the real world, convert them into WebAssembly, and then outputs vectors of the WebAssembly as samples for malicious website detection. We also conduct experimental evaluations of JABBERWOCK in terms of the processing time for dataset generation, comparison of the generated samples with actual WebAssembly samples gathered from the Internet, and an application for malicious website detection. Regarding the processing time, we show that JABBERWOCK can construct a dataset in 4.5 seconds per sample for any number of samples. Next, comparing 10,000 samples output by JABBERWOCK with 168 gathered WebAssembly samples, we believe that the generated samples by JABBERWOCK are similar to those in the real world. We then show that JABBERWOCK can provide malicious website detection with 99\% F1-score because JABBERWOCK makes a gap between benign and malicious samples as the reason for the above high score. We also confirm that JABBERWOCK can be combined with an existing malicious website detection tool to improve F1-scores. JABBERWOCK is publicly available via GitHub (https://github.com/c-chocolate/Jabberwock).
    Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit. (arXiv:2306.05998v1 [cs.LG])
    We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The agents face the identical piecewise-stationary MAB problem. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step. Our proposed solution, Restarted Bayesian Online Change Point Detection in Cooperative Upper Confidence Bound Algorithm (RBO-Coop-UCB), involves an efficient multi-agent UCB algorithm as its core enhanced with a Bayesian change point detector. We also develop a simple restart decision cooperation that improves decision-making. Theoretically, we establish that the expected group regret of RBO-Coop-UCB is upper bounded by $\mathcal{O}(KNM\log T + K\sqrt{MT\log T})$, where K is the number of agents, M is the number of arms, and T is the number of time steps. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed method outperforms the state-of-the-art algorithms.
    Machine Vision Using Cellphone Camera: A Comparison of deep networks for classifying three challenging denominations of Indian Coins. (arXiv:2306.06084v1 [cs.CV])
    Indian currency coins come in a variety of denominations. Off all the varieties Rs.1, RS.2, and Rs.5 have similar diameters. Majority of the coin styles in market circulation for denominations of Rs.1 and Rs.2 coins are nearly the same except for numerals on its reverse side. If a coin is resting on its obverse side, the correct denomination is not distinguishable by humans. Therefore, it was hypothesized that a digital image of a coin resting on its either size could be classified into its correct denomination by training a deep neural network model. The digital images were generated by using cheap cell phone cameras. To find the most suitable deep neural network architecture, four were selected based on the preliminary analysis carried out for comparison. The results confirm that two of the four deep neural network models can classify the correct denomination from either side of a coin with an accuracy of 97%.
    Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding. (arXiv:2306.06094v1 [cs.CV])
    Recently, large language models (LLMs) have made significant advancements in natural language understanding and generation. However, their potential in computer vision remains largely unexplored. In this paper, we introduce a new, exploratory approach that enables LLMs to process images using the Scalable Vector Graphics (SVG) format. By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components. Our method facilitates simple image classification, generation, and in-context learning using only LLM capabilities. We demonstrate the promise of our approach across discriminative and generative tasks, highlighting its (i) robustness against distribution shift, (ii) substantial improvements achieved by tapping into the in-context learning abilities of LLMs, and (iii) image understanding and generation capabilities with human guidance. Our code, data, and models can be found here https://github.com/mu-cai/svg-llm.
    Active Learning with Weak Supervision for Gaussian Processes. (arXiv:2204.08335v2 [stat.ML] UPDATED)
    Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation budget. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.
    MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning. (arXiv:2206.09280v3 [cs.LG] UPDATED)
    Given a graph learning task, such as link prediction, on a new graph, how can we select the best method as well as its hyperparameters (collectively called a model) without having to train or evaluate any model on the new graph? Model selection for graph learning has been largely ad hoc. A typical approach has been to apply popular methods to new datasets, but this is often suboptimal. On the other hand, systematically comparing models on the new graph quickly becomes too costly, or even impractical. In this work, we develop the first meta-learning approach for evaluation-free graph learning model selection, called MetaGL, which utilizes the prior performances of existing methods on various benchmark graph datasets to automatically select an effective model for the new graph, without any model training or evaluations. To quantify similarities across a wide variety of graphs, we introduce specialized meta-graph features that capture the structural characteristics of a graph. Then we design G-M network, which represents the relations among graphs and models, and develop a graph-based meta-learner operating on this G-M network, which estimates the relevance of each model to different graphs. Extensive experiments show that using MetaGL to select a model for the new graph greatly outperforms several existing meta-learning techniques tailored for graph learning model selection (up to 47% better), while being extremely fast at test time (~1 sec).
    Out-of-Variable Generalization for Discriminative Models. (arXiv:2304.07896v2 [cs.LG] UPDATED)
    The ability of an agent to do well in new environments is a critical aspect of intelligence. In machine learning, this ability is known as $\textit{strong}$ or $\textit{out-of-distribution}$ generalization. However, merely considering differences in data distributions is inadequate for fully capturing differences between learning environments. In the present paper, we investigate $\textit{out-of-variable}$ generalization, which pertains to an agent's generalization capabilities concerning environments with variables that were never jointly observed before. This skill closely reflects the process of animate learning: we, too, explore Nature by probing, observing, and measuring $\textit{subsets}$ of variables at any given time. Mathematically, $\textit{out-of-variable}$ generalization requires the efficient re-use of past marginal information, i.e., information over subsets of previously observed variables. We study this problem, focusing on prediction tasks across environments that contain overlapping, yet distinct, sets of causes. We show that after fitting a classifier, the residual distribution in one environment reveals the partial derivative of the true generating function with respect to the unobserved causal parent in that environment. We leverage this information and propose a method that exhibits non-trivial out-of-variable generalization performance when facing an overlapping, yet distinct, set of causal predictors.
    Beyond Reward: Offline Preference-guided Policy Optimization. (arXiv:2305.16217v2 [cs.LG] UPDATED)
    This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively. Since the dynamics and task information are orthogonal, a naive approach would involve using preference-based reward learning followed by an off-the-shelf offline RL algorithm. However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. OPPO achieves this by introducing an offline hindsight information matching objective for optimizing a contextual policy and a preference modeling objective for finding the optimal context. OPPO further integrates a well-performing decision policy by optimizing the two objectives iteratively. Our empirical results demonstrate that OPPO effectively models offline preferences and outperforms prior competing baselines, including offline RL algorithms performed over either true or pseudo reward function specifications. Our code is available on the project website: https://sites.google.com/view/oppo-icml-2023 .
    Debiasing Conditional Stochastic Optimization. (arXiv:2304.10613v2 [cs.LG] UPDATED)
    In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.
    Provably Safe Reinforcement Learning with Step-wise Violation Constraints. (arXiv:2302.06064v3 [cs.LG] UPDATED)
    In this paper, we investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we consider stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications which need to ensure safety in all decision steps and may not always possess safe actions, e.g., robot control and autonomous driving. We propose a novel algorithm SUCBVI, which guarantees $\widetilde{O}(\sqrt{ST})$ step-wise violation and $\widetilde{O}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to $S$ and $T$. Moreover, we further study a novel safe reward-free exploration problem with step-wise violation constraints. For this problem, we design an $(\varepsilon,\delta)$-PAC algorithm SRF-UCRL, which achieves nearly state-of-the-art sample complexity $\widetilde{O}((\frac{S^2AH^2}{\varepsilon}+\frac{H^4SA}{\varepsilon^2})(\log(\frac{1}{\delta})+S))$, and guarantees $\widetilde{O}(\sqrt{ST})$ violation during the exploration. The experimental results demonstrate the superiority of our algorithms in safety performance, and corroborate our theoretical results.
    Supervised learning with probabilistic morphisms and kernel mean embeddings. (arXiv:2305.06348v4 [math.ST] UPDATED)
    In this paper I propose a concept of a correct loss function in a generative model of supervised learning for an input space $\mathcal{X}$ and a label space $\mathcal{Y}$, both of which are measurable spaces. A correct loss function in a generative model of supervised learning must accurately measure the discrepancy between elements of a hypothesis space $\mathcal{H}$ of possible predictors and the supervisor operator, even when the supervisor operator does not belong to $\mathcal{H}$. To define correct loss functions, I propose a characterization of a regular conditional probability measure $\mu_{\mathcal{Y}|\mathcal{X}}$ for a probability measure $\mu$ on $\mathcal{X} \times \mathcal{Y}$ relative to the projection $\Pi_{\mathcal{X}}: \mathcal{X}\times\mathcal{Y}\to \mathcal{X}$ as a solution of a linear operator equation. If $\mathcal{Y}$ is a separable metrizable topological space with the Borel $\sigma$-algebra $ \mathcal{B} (\mathcal{Y})$, I propose an additional characterization of a regular conditional probability measure $\mu_{\mathcal{Y}|\mathcal{X}}$ as a minimizer of mean square error on the space of Markov kernels, referred to as probabilistic morphisms, from $\mathcal{X}$ to $\mathcal{Y}$. This characterization utilizes kernel mean embeddings. Building upon these results and employing inner measure to quantify the generalizability of a learning algorithm, I extend a result due to Cucker-Smale, which addresses the learnability of a regression model, to the setting of a conditional probability estimation problem. Additionally, I present a variant of Vapnik's regularization method for solving stochastic ill-posed problems, incorporating inner measure, and showcase its applications.
    Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning. (arXiv:2206.13378v2 [cs.LG] UPDATED)
    One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.
    Using Image Transformations to Learn Network Structure. (arXiv:2112.03419v2 [stat.ML] UPDATED)
    Many learning tasks require observing a sequence of images and making a decision. In a transportation problem of designing and planning for shipping boxes between nodes, we show how to treat the network of nodes and the flows between them as images. These images have useful structural information that can be statistically summarized. Using image compression techniques, we reduce an image down to a set of numbers that contain interpretable geographic information that we call geographic signatures. Using geographic signatures, we learn network structure that can be utilized to recommend future network connectivity. We develop a Bayesian reinforcement algorithm that takes advantage of statistically summarized network information as priors and user-decisions to reinforce an agent's probabilistic decision. Additionally, we show how reinforcement learning can be used with compression directly without interpretation in simple tasks.
    Deep Isolation Forest for Anomaly Detection. (arXiv:2206.06602v4 [cs.LG] UPDATED)
    Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation method often leads to (i) failure in detecting hard anomalies that are difficult to isolate in high-dimensional/non-linear-separable data space, and (ii) notorious algorithmic bias that assigns unexpectedly lower anomaly scores to artefact regions. These issues contribute to high false negative errors. Several iForest extensions are introduced, but they essentially still employ shallow, linear data partition, restricting their power in isolating true anomalies. Therefore, this paper proposes deep isolation forest. We introduce a new representation scheme that utilises casually initialised neural networks to map original data into random representation ensembles, where random axis-parallel cuts are subsequently applied to perform the data partition. This representation scheme facilitates high freedom of the partition in the original data space (equivalent to non-linear partition on subspaces of varying sizes), encouraging a unique synergy between random representations and random partition-based isolation. Extensive experiments show that our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on tabular, graph and time series datasets; our model also inherits desired scalability from iForest.
    adSformers: Personalization from Short-Term Sequences and Diversity of Representations in Etsy Ads. (arXiv:2302.01255v2 [cs.LG] UPDATED)
    In this article, we present a general approach to personalizing ads through encoding and learning from variable-length sequences of recent user actions and diverse representations. To this end we introduce a three-component module called the adSformer diversifiable personalization module (ADPM) that learns a dynamic user representation. We illustrate the module's effectiveness and flexibility by personalizing the Click-Through Rate (CTR) and Post-Click Conversion Rate (PCCVR) models used in sponsored search. The first component of the ADPM, the adSformer encoder, includes a novel adSformer block which learns the most salient sequence signals. ADPM's second component enriches the learned signal through visual, multimodal, and other pretrained representations. Lastly, the third ADPM "learned on the fly" component further diversifies the signal encoded in the dynamic user representation. The ADPM-personalized CTR and PCCVR models, henceforth referred to as adSformer CTR and adSformer PCCVR, outperform the CTR and PCCVR production baselines by $+2.66\%$ and $+2.42\%$, respectively, in offline Area Under the Receiver Operating Characteristic Curve (ROC-AUC). Following the robust online gains in A/B tests, Etsy Ads deployed the ADPM-personalized sponsored search system to $100\%$ of traffic as of February 2023.
    OCD: Learning to Overfit with Conditional Diffusion Models. (arXiv:2210.00471v5 [cs.LG] UPDATED)
    We present a dynamic model in which the weights are conditioned on an input sample x and are learned to match those that would be obtained by finetuning a base model on x and its label y. This mapping between an input sample and network weights is approximated by a denoising diffusion model. The diffusion model we employ focuses on modifying a single layer of the base model and is conditioned on the input, activations, and output of this layer. Since the diffusion model is stochastic in nature, multiple initializations generate different networks, forming an ensemble, which leads to further improvements. Our experiments demonstrate the wide applicability of the method for image classification, 3D reconstruction, tabular data, speech separation, and natural language processing. Our code is available at https://github.com/ShaharLutatiPersonal/OCD
    Visual Abstraction and Reasoning through Language. (arXiv:2303.04091v2 [cs.AI] UPDATED)
    While Artificial Intelligence (AI) models have achieved human or even superhuman performance in narrowly defined applications, they still struggle to show signs of broader and more flexible intelligence. The Abstraction and Reasoning Corpus (ARC), introduced by Fran\c{c}ois Chollet, aims to assess how close AI systems are to human-like cognitive abilities. Most current approaches rely on carefully handcrafted domain-specific languages (DSLs), which are used to brute-force solutions to the tasks present in ARC. In this work, we propose a general framework for solving ARC based on natural language descriptions of the tasks. While not yet beating state-of-the-art DSL models on ARC, we demonstrate the immense potential of our approach hinted at by the ability to solve previously unsolved tasks.
    Differentially Private Optimization for Smooth Nonconvex ERM. (arXiv:2302.04972v2 [cs.LG] UPDATED)
    We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM. We use line search, mini-batching, and a two-phase strategy to improve the speed and practicality of the algorithm. Numerical experiments demonstrate the effectiveness of these approaches.
    Estimation of Ridge Using Nonlinear Transformation on Density Function. (arXiv:2306.05722v1 [cs.LG])
    Ridges play a vital role in accurately approximating the underlying structure of manifolds. In this paper, we explore the ridge's variation by applying a concave nonlinear transformation to the density function. Through the derivation of the Hessian matrix, we observe that nonlinear transformations yield a rank-one modification of the Hessian matrix. Leveraging the variational properties of eigenvalue problems, we establish a partial order inclusion relationship among the corresponding ridges. We intuitively discover that the transformation can lead to improved estimation of the tangent space via rank-one modification of the Hessian matrix. To validate our theories, we conduct extensive numerical experiments on synthetic and real-world datasets that demonstrate the superiority of the ridges obtained from our transformed approach in approximating the underlying truth manifold compared to other manifold fitting algorithms.
    Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature. (arXiv:2306.05843v1 [cs.LG])
    Real-world optimisation problems often feature complex combinations of (1) diverse constraints, (2) discrete and mixed spaces, and are (3) highly parallelisable. (4) There are also cases where the objective function cannot be queried if unknown constraints are not satisfied, e.g. in drug discovery, safety on animal experiments (unknown constraints) must be established before human clinical trials (querying objective function) may proceed. However, most existing works target each of the above three problems in isolation and do not consider (4) unknown constraints with query rejection. For problems with diverse constraints and/or unconventional input spaces, it is difficult to apply these techniques as they are often mutually incompatible. We propose cSOBER, a domain-agnostic prudent parallel active sampler for Bayesian optimisation, based on SOBER of Adachi et al. (2023). We consider infeasibility under unknown constraints as a type of integration error that we can estimate. We propose a theoretically-driven approach that propagates such error as a tolerance in the quadrature precision that automatically balances exploitation and exploration with the expected rejection rate. Moreover, our method flexibly accommodates diverse constraints and/or discrete and mixed spaces via adaptive tolerance, including conventional zero-risk cases. We show that cSOBER outperforms competitive baselines on diverse real-world blackbox-constrained problems, including safety-constrained drug discovery, and human-relationship-aware team optimisation over graph-structured space.
    A memory-efficient neural ODE framework based on high-level adjoint differentiation. (arXiv:2206.01298v3 [cs.LG] UPDATED)
    Neural ordinary differential equations (neural ODEs) have emerged as a novel network architecture that bridges dynamical systems and deep learning. However, the gradient obtained with the continuous adjoint method in the vanilla neural ODE is not reverse-accurate. Other approaches suffer either from an excessive memory requirement due to deep computational graphs or from limited choices for the time integration scheme, hampering their application to large-scale complex dynamical systems. To achieve accurate gradients without compromising memory efficiency and flexibility, we present a new neural ODE framework, PNODE, based on high-level discrete adjoint algorithmic differentiation. By leveraging discrete adjoint time integrators and advanced checkpointing strategies tailored for these integrators, PNODE can provide a balance between memory and computational costs, while computing the gradients consistently and accurately. We provide an open-source implementation based on PyTorch and PETSc, one of the most commonly used portable, scalable scientific computing libraries. We demonstrate the performance through extensive numerical experiments on image classification and continuous normalizing flow problems. We show that PNODE achieves the highest memory efficiency when compared with other reverse-accurate methods. On the image classification problems, PNODE is up to two times faster than the vanilla neural ODE and up to 2.3 times faster than the best existing reverse-accurate method. We also show that PNODE enables the use of the implicit time integration methods that are needed for stiff dynamical systems.
    How to Backdoor Diffusion Models?. (arXiv:2212.05400v3 [cs.CV] UPDATED)
    Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models. Our code is available on https://github.com/IBM/BadDiffusion.
    Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure. (arXiv:2206.03569v4 [cs.LG] UPDATED)
    The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an $\epsilon$-optimal policy is $\tilde{\Omega}\left(|S||A|H^3 / \epsilon^2\right)$ over worst case instances of an MDP with state space $S$, action space $A$, and horizon $H$. We consider a class of MDPs for which the associated optimal $Q^*$ function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in $|S|$ and $|A|$ due to the low rank structure, we show that without imposing further assumptions beyond low rank of $Q^*$, if one is constrained to estimate the $Q$ function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon $H$ to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of $\tilde{O}\left((|S|+|A|)\mathrm{poly}(d,H)/\epsilon^2\right)$ for a rank $d$ setting, which is minimax optimal with respect to the scaling of $|S|, |A|$, and $\epsilon$. In contrast to literature on linear and low-rank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal low-rank structural assumptions required on the MDP with respect to the transition kernel versus the optimal action-value function.
    Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. (arXiv:2210.14306v4 [cs.SE] UPDATED)
    Code-recommendation systems, such as Copilot and CodeWhisperer, have the potential to improve programmer productivity by suggesting and auto-completing code. However, to fully realize their potential, we must understand how programmers interact with these systems and identify ways to improve that interaction. To make progress, we studied GitHub Copilot, a code-recommendation system used by millions of programmers daily. We developed CUPS, a taxonomy of common programmer activities when interacting with Copilot. Our study of 21 programmers, who completed coding tasks and retrospectively labeled their sessions with CUPS, showed that CUPS can help us understand how programmers interact with code-recommendation systems, revealing inefficiencies and time costs. Our insights reveal how programmers interact with Copilot and motivate new interface designs and metrics.
    Graph Generative Model for Benchmarking Graph Neural Networks. (arXiv:2207.04396v4 [cs.LG] UPDATED)
    As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible. This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. More specifically, CGT (1) generates effective benchmark graphs on which GNNs show similar task performance as on the source graphs, (2) scales to process large-scale graphs, (3) incorporates off-the-shelf privacy modules to guarantee end-user privacy of the generated graph. Extensive experiments across a vast body of graph generative models show that only our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models.
    A One-shot Framework for Distributed Clustered Learning in Heterogeneous Environments. (arXiv:2209.10866v4 [cs.LG] UPDATED)
    The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments in which users obtain data from one of $K$ different distributions. In the proposed setup, the grouping of users (based on the data distributions they sample), as well as the underlying statistical properties of the distributions, are apriori unknown. A family of One-shot Distributed Clustered Learning methods (ODCL-$\mathcal{C}$) is proposed, parametrized by the set of admissible clustering algorithms $\mathcal{C}$, with the objective of learning the true model at each user. The admissible clustering methods include $K$-means (KM) and convex clustering (CC), giving rise to various one-shot methods within the proposed family, such as ODCL-KM and ODCL-CC. The proposed one-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees. In particular, for strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error (MSE) rates in terms of the sample size. An explicit characterization of the threshold is provided in terms of problem parameters. The trade-offs with respect to selecting various clustering methods (ODCL-CC, ODCL-KM) are discussed and significant improvements over state-of-the-art are demonstrated. Numerical experiments illustrate the findings and corroborate the performance of the proposed methods.
    Data-Adaptive Probabilistic Likelihood Approximation for Ordinary Differential Equations. (arXiv:2306.05566v1 [stat.ML])
    Parameter inference for ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODE solutions are typically approximated by deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their parameter values. This produces deep local minima in the likelihood function -- a problem which existing probabilistic solvers have yet to resolve. Here, we show that a Bayesian filtering paradigm for probabilistic ODE solution can dramatically reduce sensitivity to parameters by learning from the noisy ODE observations in a data-adaptive manner. Our method is applicable to ODEs with partially unobserved components and with arbitrary non-Gaussian noise. Several examples demonstrate that it is more accurate than existing probabilistic ODE solvers, and even in some cases than the exact ODE likelihood.
    A Systematic Review of Automated Query Reformulations in Source Code Search. (arXiv:2108.09646v2 [cs.SE] UPDATED)
    Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.
    Revisiting Permutation Symmetry for Merging Models between Different Datasets. (arXiv:2306.05641v1 [cs.LG])
    Model merging is a new approach to creating a new model by combining the weights of different trained models. Previous studies report that model merging works well for models trained on a single dataset with different random seeds, while model merging between different datasets is difficult. Merging knowledge from different datasets has practical significance, but it has not been well investigated. In this paper, we investigate the properties of merging models between different datasets. Through theoretical and empirical analyses, we find that the accuracy of the merged model decreases more significantly as the datasets diverge more and that the different loss landscapes for each dataset make model merging between different datasets difficult. We also show that merged models require datasets for merging in order to achieve a high accuracy. Furthermore, we show that condensed datasets created by dataset condensation can be used as substitutes for the original datasets when merging models. We conduct experiments for model merging between different datasets. When merging between MNIST and Fashion- MNIST models, the accuracy significantly improves by 28% using the dataset and 25% using the condensed dataset compared with not using the dataset.
    A Unified Approach to Synchronization Problems over Subgroups of the Orthogonal Group. (arXiv:2009.07514v3 [math.OC] UPDATED)
    The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consider the class of synchronization problems in which the group is a closed subgroup of the orthogonal group. This class covers many group synchronization problems that arise in practice. Our contribution is fivefold. First, we propose a unified approach for solving this class of group synchronization problems, which consists of a suitable initialization step and an iterative refinement step based on the generalized power method, and show that it enjoys a strong theoretical guarantee on the estimation error under certain assumptions on the group, measurement graph, noise, and initialization. Second, we formulate two geometric conditions that are required by our approach and show that they hold for various practically relevant subgroups of the orthogonal group. The conditions are closely related to the error-bound geometry of the subgroup -- an important notion in optimization. Third, we verify the assumptions on the measurement graph and noise for standard random graph and random matrix models. Fourth, based on the classic notion of metric entropy, we develop and analyze a novel spectral-type estimator. Finally, we show via extensive numerical experiments that our proposed non-convex approach outperforms existing approaches in terms of computational speed, scalability, and/or estimation error.
    Estimating and Controlling for Equalized Odds via Sensitive Attribute Predictors. (arXiv:2207.12497v4 [cs.LG] UPDATED)
    As the use of machine learning models in real world high-stakes decision settings continues to grow, it is highly important that we are able to audit and control for any potential fairness violations these models may exhibit towards certain groups. To do so, one naturally requires access to sensitive attributes, such as demographics, gender, or other potentially sensitive features that determine group membership. Unfortunately, in many settings, this information is often unavailable. In this work we study the well known \emph{equalized odds} (EOD) definition of fairness. In a setting without sensitive attributes, we first provide tight and computable upper bounds for the EOD violation of a predictor. These bounds precisely reflect the worst possible EOD violation. Second, we demonstrate how one can provably control the worst-case EOD by a new post-processing correction method. Our results characterize when directly controlling for EOD with respect to the predicted sensitive attributes is -- and when is not -- optimal when it comes to controlling worst-case EOD. Our results hold under assumptions that are milder than previous works, and we illustrate these results with experiments on synthetic and real datasets.
    Optimal Variable Clustering for High-Dimensional Matrix Valued Data. (arXiv:2112.12909v2 [stat.ML] UPDATED)
    Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.
    L0Learn: A Scalable Package for Sparse Learning using L0 Regularization. (arXiv:2202.04820v2 [cs.LG] UPDATED)
    We present L0Learn: an open-source package for sparse linear regression and classification using $\ell_0$ regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has user-friendly R and Python interfaces. L0Learn can address problems with millions of features, achieving competitive run times and statistical performance with state-of-the-art sparse learning packages. L0Learn is available on both CRAN and GitHub (https://cran.r-project.org/package=L0Learn and https://github.com/hazimehh/L0Learn).
    Distributed Task Management in Fog Computing: A Socially Concave Bandit Game. (arXiv:2203.14572v2 [cs.MA] UPDATED)
    Fog computing leverages the task offloading capabilities at the network's edge to improve efficiency and enable swift responses to application demands. However, the design of task allocation strategies in a fog computing network is still challenging because of the heterogeneity of fog nodes and uncertainties in system dynamics. We formulate the distributed task allocation problem as a social-concave game with bandit feedback and show that the game has a unique Nash equilibrium, which is implementable using no-regret learning strategies (regret with sublinear growth). We then develop two no-regret online decision-making strategies. One strategy, namely bandit gradient ascent with momentum, is an online convex optimization algorithm with bandit feedback. The other strategy, Lipschitz bandit with initialization, is an EXP3 multi-armed bandit algorithm. We establish regret bounds for both strategies and analyze their convergence characteristics. Moreover, we compare the proposed strategies with an allocation strategy named learning with linear rewards. Theoretical- and numerical analysis shows the superior performance of the proposed strategies for efficient task allocation compared to the state-of-the-art methods.
    Context-NER : Contextual Phrase Generation at Scale. (arXiv:2109.08079v4 [cs.IR] UPDATED)
    Named Entity Recognition (NER) has seen significant progress in recent years, with numerous state-of-the-art (SOTA) models achieving high performance. However, very few studies have focused on the generation of entities' context. In this paper, we introduce CONTEXT-NER, a task that aims to generate the relevant context for entities in a sentence, where the context is a phrase describing the entity but not necessarily present in the sentence. To facilitate research in this task, we also present the EDGAR10-Q dataset, which consists of annual and quarterly reports from the top 1500 publicly traded companies. The dataset is the largest of its kind, containing 1M sentences, 2.8M entities, and an average of 35 tokens per sentence, making it a challenging dataset. We propose a baseline approach that combines a phrase generation algorithm with inferencing using a 220M language model, achieving a ROUGE-L score of 27% on the test split. Additionally, we perform a one-shot inference with ChatGPT, which obtains a 30% ROUGE-L, highlighting the difficulty of the dataset. We also evaluate models such as T5 and BART, which achieve a maximum ROUGE-L of 49% after supervised finetuning on EDGAR10-Q. We also find that T5-large, when pre-finetuned on EDGAR10-Q, achieve SOTA results on downstream finance tasks such as Headline, FPB, and FiQA SA, outperforming vanilla version by 10.81 points. To our surprise, this 66x smaller pre-finetuned model also surpasses the finance-specific LLM BloombergGPT-50B by 15 points. We hope that our dataset and generated artifacts will encourage further research in this direction, leading to the development of more sophisticated language models for financial text analysis
    NuCLR: Nuclear Co-Learned Representations. (arXiv:2306.06099v1 [nucl-th])
    We introduce Nuclear Co-Learned Representations (NuCLR), a deep learning model that predicts various nuclear observables, including binding and decay energies, and nuclear charge radii. The model is trained using a multi-task approach with shared representations and obtains state-of-the-art performance, achieving levels of precision that are crucial for understanding fundamental phenomena in nuclear (astro)physics. We also report an intriguing finding that the learned representation of NuCLR exhibits the prominent emergence of crucial aspects of the nuclear shell model, namely the shell structure, including the well-known magic numbers, and the Pauli Exclusion Principle. This suggests that the model is capable of capturing the underlying physical principles and that our approach has the potential to offer valuable insights into nuclear theory.
    Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation. (arXiv:2306.05584v1 [cs.CV])
    A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the tightly coupled relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture comprises two lightweight and inter-connected heads that predict segmentation masks using point-level invariant features and motion estimates from SE(3) equivariant features without the prerequisites of category information. Our unified training strategy can be performed online while jointly optimizing the two predictions by exploiting the interrelations among scene flow, segmentation mask, and rigid transformations. We show experiments on four datasets as evidence of the superiority of our method both in terms of model performance and computational efficiency with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.
    On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning. (arXiv:2306.05637v1 [cs.LG])
    Recently, unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL) by pretraining a model from a large unlabeled dataset. The underlying principle of these methods is to learn temporally predictive representations by predicting future states in the latent space. However, an important challenge of this approach is the representational collapse, where the subspace of the latent representations collapses into a low-dimensional manifold. To address this issue, we propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold by decorrelating the features in the latent space. Through extensive empirical studies, we demonstrate that our framework effectively learns predictive representations without collapse, which significantly improves the sample efficiency of state-of-the-art URL methods on the Atari 100k benchmark. The code is available at https://github.com/dojeon-ai/SimTPR.
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v1 [cs.LG])
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.
    BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning. (arXiv:2206.08657v5 [cs.CV] UPDATED)
    Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years. Current VL models either use lightweight uni-modal encoders and learn to extract, align and fuse both modalities simultaneously in a deep cross-modal encoder, or feed the last-layer uni-modal representations from the deep pre-trained uni-modal encoders into the top cross-modal encoder. Both approaches potentially restrict vision-language representation learning and limit model performance. In this paper, we propose BridgeTower, which introduces multiple bridge layers that build a connection between the top layers of uni-modal encoders and each layer of the cross-modal encoder. This enables effective bottom-up cross-modal alignment and fusion between visual and textual representations of different semantic levels of pre-trained uni-modal encoders in the cross-modal encoder. Pre-trained with only 4M images, BridgeTower achieves state-of-the-art performance on various downstream vision-language tasks. In particular, on the VQAv2 test-std set, BridgeTower achieves an accuracy of 78.73%, outperforming the previous state-of-the-art model METER by 1.09% with the same pre-training data and almost negligible additional parameters and computational costs. Notably, when further scaling the model, BridgeTower achieves an accuracy of 81.15%, surpassing models that are pre-trained on orders-of-magnitude larger datasets. Code and checkpoints are available at https://github.com/microsoft/BridgeTower.
    Differentially Private Sharpness-Aware Training. (arXiv:2306.05651v1 [cs.LG])
    Training deep learning models with differential privacy (DP) results in a degradation of performance. The training dynamics of models with DP show a significant difference from standard training, whereas understanding the geometric properties of private learning remains largely unexplored. In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning. We show that flat minima can help reduce the negative effects of per-example gradient clipping and the addition of Gaussian noise. We then verify the effectiveness of Sharpness-Aware Minimization (SAM) for seeking flat minima in private learning. However, we also discover that SAM is detrimental to the privacy budget and computational time due to its two-step optimization. Thus, we propose a new sharpness-aware training method that mitigates the privacy-optimization trade-off. Our experimental results demonstrate that the proposed method improves the performance of deep learning models with DP from both scratch and fine-tuning. Code is available at https://github.com/jinseongP/DPSAT.
    One-Shot Machine Unlearning with Mnemonic Code. (arXiv:2306.05670v1 [cs.LG])
    Deep learning has achieved significant improvements in accuracy and has been applied to various fields. With the spread of deep learning, a new problem has also emerged; deep learning models can sometimes have undesirable information from an ethical standpoint. This problem must be resolved if deep learning is to make sensitive decisions such as hiring and prison sentencing. Machine unlearning (MU) is the research area that responds to such demands. MU aims at forgetting about undesirable training data from a trained deep learning model. A naive MU approach is to re-train the whole model with the training data from which the undesirable data has been removed. However, re-training the whole model can take a huge amount of time and consumes significant computer resources. To make MU even more practical, a simple-yet-effective MU method is required. In this paper, we propose a one-shot MU method, which does not need additional training. To design one-shot MU, we add noise to the model parameters that are sensitive to undesirable information. In our proposed method, we use the Fisher information matrix (FIM) to estimate the sensitive model parameters. Training data were usually used to evaluate the FIM in existing methods. In contrast, we avoid the need to retain the training data for calculating the FIM by using class-specific synthetic signals called mnemonic code. Extensive experiments using artificial and natural datasets demonstrate that our method outperforms the existing methods.
    Equivariant vs. Invariant Layers: A Comparison of Backbone and Pooling for Point Cloud Classification. (arXiv:2306.05553v1 [cs.CV])
    Learning from set-structured data, such as point clouds, has gained significant attention from the community. Geometric deep learning provides a blueprint for designing effective set neural networks by incorporating permutation symmetry. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression/classification head. While existing literature has focused on improving permutation equivariant backbones, the impact of global pooling is often overlooked. In this paper, we examine the interplay between permutation equivariant backbones and permutation invariant global pooling on three benchmark point cloud classification datasets. Our findings reveal that: 1) complex pooling methods, such as transport-based or attention-based poolings, can significantly boost the performance of simple backbones, but the benefits diminish for more complex backbones, 2) even complex backbones can benefit from pooling layers in low data scenarios, 3) surprisingly, the choice of pooling layers can have a more significant impact on the model's performance than adjusting the width and depth of the backbone, and 4) pairwise combination of pooling layers can significantly improve the performance of a fixed backbone. Our comprehensive study provides insights for practitioners to design better permutation invariant set neural networks.
    Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks. (arXiv:2306.05674v1 [stat.ML])
    Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{quantify}, and \emph{remove}, the procedural uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled data set, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads.
    PeFLL: A Lifelong Learning Approach to Personalized Federated Learning. (arXiv:2306.05515v1 [cs.LG])
    Personalized federated learning (pFL) has emerged as a popular approach to dealing with the challenge of statistical heterogeneity between the data distributions of the participating clients. Instead of learning a single global model, pFL aims to learn an individual model for each client while still making use of the data available at other clients. In this work, we present PeFLL, a new pFL approach rooted in lifelong learning that performs well not only on clients present during its training phase, but also on any that may emerge in the future. PeFLL learns to output client specific models by jointly training an embedding network and a hypernetwork. The embedding network learns to represent clients in a latent descriptor space in a way that reflects their similarity to each other. The hypernetwork learns a mapping from this latent space to the space of possible client models. We demonstrate experimentally that PeFLL produces models of superior accuracy compared to previous methods, especially for clients not seen during training, and that it scales well to large numbers of clients. Moreover, generating a personalized model for a new client is efficient as no additional fine-tuning or optimization is required by either the client or the server. We also present theoretical results supporting PeFLL in the form of a new PAC-Bayesian generalization bound for lifelong learning and we prove the convergence of our proposed optimization procedure.
    Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data. (arXiv:2306.05535v1 [cs.CL])
    A large portion of society united around the same vision and ideas carries enormous energy. That is precisely what political figures would like to accumulate for their cause. With this goal in mind, they can sometimes resort to distorting or hiding the truth, unintentionally or on purpose, which opens the door for misinformation and disinformation. Tools for automatic detection of check-worthy claims would be of great help to moderators of debates, journalists, and fact-checking organizations. While previous work on detecting check-worthy claims has focused on text, here we explore the utility of the audio signal as an additional information source. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech. Our evaluation results show that the audio modality together with text yields improvements over text alone in the case of multiple speakers. Moreover, an audio-only model could outperform a text-only one for a single speaker.
    Intelligent Energy Management with IoT Framework in Smart Cities Using Intelligent Analysis: An Application of Machine Learning Methods for Complex Networks and Systems. (arXiv:2306.05567v1 [cs.LG])
    Smart buildings are increasingly using Internet of Things (IoT)-based wireless sensing systems to reduce their energy consumption and environmental impact. As a result of their compact size and ability to sense, measure, and compute all electrical properties, Internet of Things devices have become increasingly important in our society. A major contribution of this study is the development of a comprehensive IoT-based framework for smart city energy management, incorporating multiple components of IoT architecture and framework. An IoT framework for intelligent energy management applications that employ intelligent analysis is an essential system component that collects and stores information. Additionally, it serves as a platform for the development of applications by other companies. Furthermore, we have studied intelligent energy management solutions based on intelligent mechanisms. The depletion of energy resources and the increase in energy demand have led to an increase in energy consumption and building maintenance. The data collected is used to monitor, control, and enhance the efficiency of the system.
    Evaluating and Incentivizing Diverse Data Contributions in Collaborative Learning. (arXiv:2306.05592v1 [cs.GT])
    For a federated learning model to perform well, it is crucial to have a diverse and representative dataset. However, the data contributors may only be concerned with the performance on a specific subset of the population, which may not reflect the diversity of the wider population. This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance. In this work, we formulate this tension as a game between the principal and multiple agents, and focus on the linear experiment design problem to formally study their interaction. We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium. We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population, thereby maximizing global performance.
    Communication-Efficient Zeroth-Order Distributed Online Optimization: Algorithm, Theory, and Applications. (arXiv:2306.05655v1 [cs.LG])
    This paper focuses on a multi-agent zeroth-order online optimization problem in a federated learning setting for target tracking. The agents only sense their current distances to their targets and aim to maintain a minimum safe distance from each other to prevent collisions. The coordination among the agents and dissemination of collision-prevention information is managed by a central server using the federated learning paradigm. The proposed formulation leads to an instance of distributed online nonconvex optimization problem that is solved via a group of communication-constrained agents. To deal with the communication limitations of the agents, an error feedback-based compression scheme is utilized for agent-to-server communication. The proposed algorithm is analyzed theoretically for the general class of distributed online nonconvex optimization problems. We provide non-asymptotic convergence rates that show the dominant term is independent of the characteristics of the compression scheme. Our theoretical results feature a new approach that employs significantly more relaxed assumptions in comparison to standard literature. The performance of the proposed solution is further analyzed numerically in terms of tracking errors and collisions between agents in two relevant applications.
    Two-level histograms for dealing with outliers and heavy tail distributions. (arXiv:2306.05786v1 [cs.LG])
    Histograms are among the most popular methods used in exploratory analysis to summarize univariate distributions. In particular, irregular histograms are good non-parametric density estimators that require very few parameters: the number of bins with their lengths and frequencies. Many approaches have been proposed in the literature to infer these parameters, either assuming hypotheses about the underlying data distributions or exploiting a model selection approach. In this paper, we focus on the G-Enum histogram method, which exploits the Minimum Description Length (MDL) principle to build histograms without any user parameter and achieves state-of-the art performance w.r.t accuracy; parsimony and computation time. We investigate on the limits of this method in the case of outliers or heavy-tailed distributions. We suggest a two-level heuristic to deal with such cases. The first level exploits a logarithmic transformation of the data to split the data set into a list of data subsets with a controlled range of values. The second level builds a sub-histogram for each data subset and aggregates them to obtain a complete histogram. Extensive experiments show the benefits of the approach.
    AI Enhanced Control Engineering Methods. (arXiv:2306.05545v1 [math.OC])
    AI and machine learning based approaches are becoming ubiquitous in almost all engineering fields. Control engineering cannot escape this trend. In this paper, we explore how AI tools can be useful in control applications. The core tool we focus on is automatic differentiation. Two immediate applications are linearization of system dynamics for local stability analysis or for state estimation using Kalman filters. We also explore other usages such as conversion of differential algebraic equations to ordinary differential equations for control design. In addition, we explore the use of machine learning models for global parameterizations of state vectors and control inputs in model predictive control applications. For each considered use case, we give examples and results.
    Reevaluating Loss Functions: Enhancing Robustness to Label Noise in Deep Learning Models. (arXiv:2306.05497v1 [cs.LG])
    Large annotated datasets inevitably contain incorrect labels, which poses a major challenge for the training of deep neural networks as they easily fit the labels. Only when training with a robust model that is not easily distracted by the noise, a good generalization performance can be achieved. A simple yet effective way to create a noise robust model is to use a noise robust loss function. However, the number of proposed loss functions is large, they often come with hyperparameters, and may learn slower than the widely used but noise sensitive Cross Entropy loss. By heuristic considerations and extensive numerical experiments, we study in which situations the proposed loss functions are applicable and give suggestions on how to choose an appropriate loss. Additionally, we propose a novel technique to enhance learning with bounded loss functions: the inclusion of an output bias, i.e. a slight increase in the neuron pre-activation corresponding to the correct label. Surprisingly, we find that this not only significantly improves the learning of bounded losses, but also leads to the Mean Absolute Error loss outperforming the Cross Entropy loss on the Cifar-100 dataset - even in the absence of additional label noise. This suggests that training with a bounded loss function can be advantageous even in the presence of minimal label noise. To further strengthen our analysis of the learning behavior of different loss functions, we additionally design and test a novel loss function denoted as Bounded Cross Entropy.
    SGLD-Based Information Criteria and the Over-Parameterized Regime. (arXiv:2306.05583v1 [cs.LG])
    Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by stochastic gradient Langevin dynamics (SGLD). Notably, the AIC and BIC penalty terms for SGLD correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by characterizing the SGLD-based BIC for the random feature model in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the refined SGLD-based BIC can track the double-descent curve, providing meaningful guidance for model selection and revealing new insights into the behavior of SGLD learning algorithms in the over-parameterized regime.
    Latent Phrase Matching for Dysarthric Speech. (arXiv:2306.05446v1 [eess.AS])
    Many consumer speech recognition systems are not tuned for people with speech disabilities, resulting in poor recognition and user experience, especially for severe speech differences. Recent studies have emphasized interest in personalized speech models from people with atypical speech patterns. We propose a query-by-example-based personalized phrase recognition system that is trained using small amounts of speech, is language agnostic, does not assume a traditional pronunciation lexicon, and generalizes well across speech difference severities. On an internal dataset collected from 32 people with dysarthria, this approach works regardless of severity and shows a 60% improvement in recall relative to a commercial speech recognition system. On the public EasyCall dataset of dysarthric speech, our approach improves accuracy by 30.5%. Performance degrades as the number of phrases increases, but consistently outperforms ASR systems when trained with 50 unique phrases.
    Boosting with Tempered Exponential Measures. (arXiv:2306.05487v1 [cs.LG])
    One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power of the measure and not the measure itself. TEMs are indexed by a parameter $t$ and generalize exponential families ($t=1$). Our algorithm, $t$-AdaBoost, recovers AdaBoost~as a special case ($t=1$). We show that $t$-AdaBoost retains AdaBoost's celebrated exponential convergence rate when $t\in [0,1)$ while allowing a slight improvement of the rate's hidden constant compared to $t=1$. $t$-AdaBoost partially computes on a generalization of classical arithmetic over the reals and brings notable properties like guaranteed bounded leveraging coefficients for $t\in [0,1)$. From the loss that $t$-AdaBoost minimizes (a generalization of the exponential loss), we show how to derive a new family of {\it tempered} losses for the induction of domain-partitioning classifiers like decision trees. Crucially, strict properness is ensured for all while their boosting rates span the full known spectrum. Experiments using $t$-AdaBoost+trees display that significant leverage can be achieved by tuning $t$.
    DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems. (arXiv:2306.05438v1 [cs.LG])
    The application of machine learning (ML) models to the analysis of optimization algorithms requires the representation of optimization problems using numerical features. These features can be used as input for ML models that are trained to select or to configure a suitable algorithm for the problem at hand. Since in pure black-box optimization information about the problem instance can only be obtained through function evaluation, a common approach is to dedicate some function evaluations for feature extraction, e.g., using random sampling. This approach has two key downsides: (1) It reduces the budget left for the actual optimization phase, and (2) it neglects valuable information that could be obtained from a problem-solver interaction. In this paper, we propose a feature extraction method that describes the trajectories of optimization algorithms using simple descriptive statistics. We evaluate the generated features for the task of classifying problem classes from the Black Box Optimization Benchmarking (BBOB) suite. We demonstrate that the proposed DynamoRep features capture enough information to identify the problem class on which the optimization algorithm is running, achieving a mean classification accuracy of 95% across all experiments.
    Trajectory Prediction with Observations of Variable-Length for Motion Planning in Highway Merging scenarios. (arXiv:2306.05478v1 [cs.RO])
    Accurate trajectory prediction of nearby vehicles is crucial for the safe motion planning of automated vehicles in dynamic driving scenarios such as highway merging. Existing methods cannot initiate prediction for a vehicle unless observed for a fixed duration of two or more seconds. This prevents a fast reaction by the ego vehicle to vehicles that enter its perception range, thus creating safety concerns. Therefore, this paper proposes a novel transformer-based trajectory prediction approach, specifically trained to handle any observation length larger than one frame. We perform a comprehensive evaluation of the proposed method using two large-scale highway trajectory datasets, namely the highD and exiD. In addition, we study the impact of the proposed prediction approach on motion planning and control tasks using extensive merging scenarios from the exiD dataset. To the best of our knowledge, this marks the first instance where such a large-scale highway merging dataset has been employed for this purpose. The results demonstrate that the prediction model achieves state-of-the-art performance on highD dataset and maintains lower prediction error w.r.t. the constant velocity across all observation lengths in exiD. Moreover, it significantly enhances safety, comfort, and efficiency in dense traffic scenarios, as compared to the constant velocity model.
    Robust Brain Age Estimation via Regression Models and MRI-derived Features. (arXiv:2306.05514v1 [eess.IV])
    The determination of biological brain age is a crucial biomarker in the assessment of neurological disorders and understanding of the morphological changes that occur during aging. Various machine learning models have been proposed for estimating brain age through Magnetic Resonance Imaging (MRI) of healthy controls. However, developing a robust brain age estimation (BAE) framework has been challenging due to the selection of appropriate MRI-derived features and the high cost of MRI acquisition. In this study, we present a novel BAE framework using the Open Big Healthy Brain (OpenBHB) dataset, which is a new multi-site and publicly available benchmark dataset that includes region-wise feature metrics derived from T1-weighted (T1-w) brain MRI scans of 3965 healthy controls aged between 6 to 86 years. Our approach integrates three different MRI-derived region-wise features and different regression models, resulting in a highly accurate brain age estimation with a Mean Absolute Error (MAE) of 3.25 years, demonstrating the framework's robustness. We also analyze our model's regression-based performance on gender-wise (male and female) healthy test groups. The proposed BAE framework provides a new approach for estimating brain age, which has important implications for the understanding of neurological disorders and age-related brain changes.
    AMEE: A Robust Framework for Explanation Evaluation in Time Series Classification. (arXiv:2306.05501v1 [cs.LG])
    This paper aims to provide a framework to quantitatively evaluate and rank explanation methods for the time series classification task, which deals with a prevalent data type in critical domains such as healthcare and finance. The recent surge of research interest in explanation methods for time series classification has provided a great variety of explanation techniques. Nevertheless, when these explanation techniques disagree on a specific problem, it remains unclear which of them to use. Comparing the explanations to find the right answer is non-trivial. Two key challenges remain: how to quantitatively and robustly evaluate the informativeness (i.e., relevance for the classification task) of a given explanation method, and how to compare explanation methods side-by-side. We propose AMEE, a Model-Agnostic Explanation Evaluation framework for quantifying and comparing multiple saliency-based explanations for time series classification. Perturbation is added to the input time series guided by the saliency maps (i.e., importance weights for each point in the time series). The impact of perturbation on classification accuracy is measured and used for explanation evaluation. The results show that perturbing discriminative parts of the time series leads to significant changes in classification accuracy. To be robust to different types of perturbations and different types of classifiers, we aggregate the accuracy loss across perturbations and classifiers. This allows us to objectively quantify and rank different explanation methods. We provide a quantitative and qualitative analysis for synthetic datasets, a variety of UCR benchmark datasets, as well as a real-world dataset with known expert ground truth.
    Simulation and Prediction of Countercurrent Spontaneous Imbibition at Early and Late Times Using Physics-Informed Neural Networks. (arXiv:2306.05554v1 [physics.comp-ph])
    Countercurrent spontaneous imbibition (COUCSI) is a process in porous materials in which a wetting phase displaces non-wetting phase. In this work, we investigate for the first time the application of Physics-Informed Neural Networks (PINNs) in solving the 1D COUCSI problem in both early (ET) and late (LT) times. Also novel, we examine the Change-of-Variables technique for improving the performance of PINNs. We formulated the COUCSI problem in three equivalent forms by changing the independent variables: XT-, XY-, and Z-formulations. The first describes saturation as function of normalized position X and time T; the second as function of X and Y=T^0.5; and the third as a sole function of Z=X/T^0.5 (valid only at ET). The PINN model was generated using a feed-forward neural network and trained based on minimizing a weighted loss function, including the physics-informed loss term and terms corresponding to the initial and boundary conditions. No synthetical or experimental data were involved in the training. All three formulations could closely approximate the correct solutions (obtained by fine-grid numerical simulations), with water saturation mean absolute errors (MAE) around 0.019 and 0.009 for XT and XY formulations and 0.012 for the Z formulation at ET. The Z formulation perfectly captured the self-similarity of the system at ET. This was less captured by XT and XY formulations. The total variation (TV) of saturation was preserved in the Z formulation, and it was better preserved with XY- than XT formulation. It was demonstrated that redefining the problem based on physics-inspired variables reduced the non-linearity of the problem and allowed higher solution accuracies, a higher degree of loss-landscape convexity, a lower number of required collocation points, smaller network sizes, and more computationally efficient solutions.
    On Performance Discrepancies Across Local Homophily Levels in Graph Neural Networks. (arXiv:2306.05557v1 [cs.SI])
    Research on GNNs has highlighted a relationship between high homophily (i.e., the tendency for nodes of a similar class to connect) and strong predictive performance in node classification. However, recent research has found the relationship to be more nuanced, demonstrating that even simple GNNs can learn in certain heterophilous settings. To bridge the gap between these findings, we revisit the assumptions made in previous works and identify that datasets are often treated as having a constant homophily level across nodes. To align closer to real-world datasets, we theoretically and empirically study the performance of GNNs when the local homophily level of a node deviates at test-time from the global homophily level of its graph. To aid our theoretical analysis, we introduce a new parameter to the preferential attachment model commonly used in homophily analysis to enable the control of local homophily levels in generated graphs, enabling a systematic empirical study on how local homophily can impact performance. We additionally perform a granular analysis on a number of real-world datasets with varying global homophily levels. Across our theoretical and empirical results, we find that (a)~ GNNs can fail to generalize to test nodes that deviate from the global homophily of a graph, (b)~ high local homophily does not necessarily confer high performance for a node, and (c)~ GNN models designed to handle heterophily are able to perform better across varying heterophily ranges irrespective of the dataset's global homophily. These findings point towards a GNN's over-reliance on the global homophily used for training and motivates the need to design GNNs that can better generalize across large local homophily ranges.
    CARSO: Counter-Adversarial Recall of Synthetic Observations. (arXiv:2306.06081v1 [cs.CV])
    In this paper, we propose a novel adversarial defence mechanism for image classification -- CARSO -- inspired by cues from cognitive neuroscience. The method is synergistically complementary to adversarial training and relies on knowledge of the internal representation of the attacked classifier. Exploiting a generative model for adversarial purification, conditioned on such representation, it samples reconstructions of inputs to be finally classified. Experimental evaluation by a well-established benchmark of varied, strong adaptive attacks, across diverse image datasets and classifier architectures, shows that CARSO is able to defend the classifier significantly better than state-of-the-art adversarial training alone -- with a tolerable clean accuracy toll. Furthermore, the defensive architecture succeeds in effectively shielding itself from unforeseen threats, and end-to-end attacks adapted to fool stochastic defences. Code and pre-trained models are available at https://github.com/emaballarin/CARSO .
    Is Attentional Channel Processing Design Required? Comprehensive Analysis Of Robustness Between Vision Transformers And Fully Attentional Networks. (arXiv:2306.05495v1 [cs.CV])
    The robustness testing has been performed for standard CNN models and Vision Transformers, however there is a lack of comprehensive study between the robustness of traditional Vision Transformers without an extra attentional channel design and the latest fully attentional network(FAN) models. So in this paper, we use the ImageNet dataset to compare the robustness of fully attentional network(FAN) models with traditional Vision Transformers to understand the role of an attentional channel processing design using white box attacks and also study the transferability between the same using black box attacks.
    Multi-Modal Classifiers for Open-Vocabulary Object Detection. (arXiv:2306.05493v1 [cs.CV])
    The goal of this paper is open-vocabulary object detection (OVOD) $\unicode{x2013}$ building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining. We adopt a standard two-stage object detector architecture, and explore three ways for specifying novel categories: via language descriptions, via image exemplars, or via a combination of the two. We make three contributions: first, we prompt a large language model (LLM) to generate informative language descriptions for object classes, and construct powerful text-based classifiers; second, we employ a visual aggregator on image exemplars that can ingest any number of images as input, forming vision-based classifiers; and third, we provide a simple method to fuse information from language descriptions and image exemplars, yielding a multi-modal classifier. When evaluating on the challenging LVIS open-vocabulary benchmark we demonstrate that: (i) our text-based classifiers outperform all previous OVOD works; (ii) our vision-based classifiers perform as well as text-based classifiers in prior work; (iii) using multi-modal classifiers perform better than either modality alone; and finally, (iv) our text-based and multi-modal classifiers yield better performance than a fully-supervised detector.
    Word-Level Explanations for Analyzing Bias in Text-to-Image Models. (arXiv:2306.05500v1 [cs.CL])
    Text-to-image models take a sentence (i.e., prompt) and generate images associated with this input prompt. These models have created award wining-art, videos, and even synthetic datasets. However, text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex. This paper investigates which word in the input prompt is responsible for bias in generated images. We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output. Our method follows the principle of \emph{explaining by removing}, leveraging masked language models to calculate the influence scores. We perform experiments on Stable Diffusion to demonstrate that our method identifies the replication of societal stereotypes in generated images.
    Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT. (arXiv:2306.05524v1 [cs.CL])
    With ChatGPT under the spotlight, utilizing large language models (LLMs) for academic writing has drawn a significant amount of discussions and concerns in the community. While substantial research efforts have been stimulated for detecting LLM-Generated Content (LLM-content), most of the attempts are still in the early stage of exploration. In this paper, we present a holistic investigation of detecting LLM-generate academic writing, by providing a dataset, evidence, and algorithms, in order to inspire more community effort to address the concern of LLM academic misuse. We first present GPABenchmark, a benchmarking dataset of 600,000 samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of research papers in CS, physics, and humanities and social sciences (HSS). We show that existing open-source and commercial GPT detectors provide unsatisfactory performance on GPABenchmark, especially for GPT-polished text. Moreover, through a user study of 150+ participants, we show that it is highly challenging for human users, including experienced faculty members and researchers, to identify GPT-generated abstracts. We then present CheckGPT, a novel LLM-content detector consisting of a general representation module and an attentive-BiLSTM classification module, which is accurate, transferable, and interpretable. Experimental results show that CheckGPT achieves an average classification accuracy of 98% to 99% for the task-specific discipline-specific detectors and the unified detectors. CheckGPT is also highly transferable that, without tuning, it achieves ~90% accuracy in new domains, such as news articles, while a model tuned with approximately 2,000 samples in the target domain achieves ~98% accuracy. Finally, we demonstrate the explainability insights obtained from CheckGPT to reveal the key behaviors of how LLM generates texts.
    Detection of Late Blight Disease in Tomato Leaf Using Image Processing Techniques. (arXiv:2306.06080v1 [cs.CV])
    =One of the most frequently farmed crops is the tomato crop. Late blight is the most prevalent tomato disease in the world, and often causes a significant reduction in the production of tomato crops. The importance of tomatoes as an agricultural product necessitates early detection of late blight. It is produced by the fungus Phytophthora. The earliest signs of late blight on tomatoes are unevenly formed, water-soaked lesions on the leaves located on the plant canopy's younger leave White cottony growth may appear in humid environments evident on the undersides of the leaves that have been impacted. Lesions increase as the disease proceeds, turning the leaves brown to shrivel up and die. Using picture segmentation and the Multi-class SVM technique, late blight disorder is discovered in this work. Image segmentation is employed for separating damaged areas on leaves, and the Multi-class SVM method is used for reliable disease categorization. 30 reputable studies were chosen from a total of 2770 recognized papers. The primary goal of this study is to compile cutting-edge research that identifies current research trends, problems, and prospects for late blight detection. It also looks at current approaches for applying image processing to diagnose and detect late blight. A suggested taxonomy for late blight detection has also been provided. In the same way, a model for the development of the solutions to problems is also presented. Finally, the research gaps have been presented in terms of open issues for the provision of future directions in image processing for the researchers.
    Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning. (arXiv:2306.05445v1 [physics.chem-ph])
    Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications are not functions of a single molecular structure, but rather determined from the equilibrium distribution of structures. Traditional methods for obtaining these distributions, such as molecular dynamics simulation, are computationally expensive and often intractable. In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. Inspired by the annealing process in thermodynamics, DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system, such as a chemical graph or a protein sequence. This framework enables efficient generation of diverse conformations and provides estimations of state densities. We demonstrate the performance of DiG on several molecular tasks, including protein conformation sampling, ligand structure sampling, catalyst-adsorbate sampling, and property-guided structure generation. DiG presents a significant advancement in methodology for statistically understanding molecular systems, opening up new research opportunities in molecular science.
    Path Neural Networks: Expressive and Accurate Graph Neural Networks. (arXiv:2306.05955v1 [cs.LG])
    Graph neural networks (GNNs) have recently become the standard approach for learning with graph-structured data. Prior work has shed light into their potential, but also their limitations. Unfortunately, it was shown that standard GNNs are limited in their expressive power. These models are no more powerful than the 1-dimensional Weisfeiler-Leman (1-WL) algorithm in terms of distinguishing non-isomorphic graphs. In this paper, we propose Path Neural Networks (PathNNs), a model that updates node representations by aggregating paths emanating from nodes. We derive three different variants of the PathNN model that aggregate single shortest paths, all shortest paths and all simple paths of length up to K. We prove that two of these variants are strictly more powerful than the 1-WL algorithm, and we experimentally validate our theoretical results. We find that PathNNs can distinguish pairs of non-isomorphic graphs that are indistinguishable by 1-WL, while our most expressive PathNN variant can even distinguish between 3-WL indistinguishable graphs. The different PathNN variants are also evaluated on graph classification and graph regression datasets, where in most cases, they outperform the baseline methods.
    Towards End-to-end Speech-to-text Summarization. (arXiv:2306.05432v1 [cs.CL])
    Speech-to-text (S2T) summarization is a time-saving technique for filtering and keeping up with the broadcast news uploaded online on a daily basis. The rise of large language models from deep learning with impressive text generation capabilities has placed the research focus on summarization systems that produce paraphrased compact versions of the document content, also known as abstractive summaries. End-to-end (E2E) modelling of S2T abstractive summarization is a promising approach that offers the possibility of generating rich latent representations that leverage non-verbal and acoustic information, as opposed to the use of only linguistic information from automatically generated transcripts in cascade systems. However, the few literature on E2E modelling of this task fails on exploring different domains, namely broadcast news, which is challenging domain where large and diversified volumes of data are presented to the user every day. We model S2T summarization both with a cascade and an E2E system for a corpus of broadcast news in French. Our novel E2E model leverages external data by resorting to transfer learning from a pre-trained T2T summarizer. Experiments show that both our cascade and E2E abstractive summarizers are stronger than an extractive baseline. However, the performance of the E2E model still lies behind the cascade one, which is object of an extensive analysis that includes future directions to close that gap.
    Learning Not to Spoof. (arXiv:2306.06087v1 [cs.LG])
    As intelligent trading agents based on reinforcement learning (RL) gain prevalence, it becomes more important to ensure that RL agents obey laws, regulations, and human behavioral expectations. There is substantial literature concerning the aversion of obvious catastrophes like crashing a helicopter or bankrupting a trading account, but little around the avoidance of subtle non-normative behavior for which there are examples, but no programmable definition. Such behavior may violate legal or regulatory, rather than physical or monetary, constraints. In this article, I consider a series of experiments in which an intelligent stock trading agent maximizes profit but may also inadvertently learn to spoof the market in which it participates. I first inject a hand-coded spoofing agent to a multi-agent market simulation and learn to recognize spoofing activity sequences. Then I replace the hand-coded spoofing trader with a simple profit-maximizing RL agent and observe that it independently discovers spoofing as the optimal strategy. Finally, I introduce a method to incorporate the recognizer as normative guide, shaping the agent's perceived rewards and altering its selected actions. The agent remains profitable while avoiding spoofing behaviors that would result in even higher profit. After presenting the empirical results, I conclude with some recommendations. The method should generalize to the reduction of any unwanted behavior for which a recognizer can be learned.
    Differentially Private Image Classification by Learning Priors from Random Processes. (arXiv:2306.06076v1 [cs.CV])
    In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data. In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data. We propose DP-RandP, a three-phase approach. We attain new state-of-the-art accuracy when training from scratch on CIFAR10, CIFAR100, and MedMNIST for a range of privacy budgets $\varepsilon \in [1, 8]$. In particular, we improve the previous best reported accuracy on CIFAR10 from $60.6 \%$ to $72.3 \%$ for $\varepsilon=1$. Our code is available at https://github.com/inspire-group/DP-RandP.
    On the Importance of Exploration for Generalization in Reinforcement Learning. (arXiv:2306.05483v1 [cs.LG])
    Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy plays a key role in its ability to generalize to new environments. Through a series of experiments in a tabular contextual MDP, we show that exploration is helpful not only for efficiently finding the optimal policy for the training environments but also for acquiring knowledge that helps decision making in unseen environments. Based on these observations, we propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high epistemic uncertainty through an ensemble of Q-value distributions. Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter, two benchmarks for generalization in RL with high-dimensional observations. The open-sourced implementation can be found at https://github.com/facebookresearch/ede .
    DeepSeaNet: Improving Underwater Object Detection using EfficientDet. (arXiv:2306.06075v1 [cs.CV])
    Marine animals and deep underwater objects are difficult to recognize and monitor for safety of aquatic life. There is an increasing challenge when the water is saline with granular particles and impurities. In such natural adversarial environment, traditional approaches like CNN start to fail and are expensive to compute. This project involves implementing and evaluating various object detection models, including EfficientDet, YOLOv5, YOLOv8, and Detectron2, on an existing annotated underwater dataset, called the Brackish-Dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. The aim of this research project is to study the efficiency of newer models on the same dataset and contrast them with the previous results based on accuracy and inference time. Firstly, I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset. Secondly, I provide a modified BiSkFPN mechanism (BiFPN neck with skip connections) to perform complex feature fusion in adversarial noise which makes modified EfficientDet robust to perturbations. Third, analyzed the effect on accuracy of EfficientDet (98.63% mAP) and YOLOv5 by adversarial learning (98.04% mAP). Last, I provide class activation map based explanations (CAM) for the two models to promote Explainability in black box models. Overall, the results indicate that modified EfficientDet achieved higher accuracy with five-fold cross validation than the other models with 88.54% IoU of feature maps.
    Self-Interpretable Time Series Prediction with Counterfactual Explanations. (arXiv:2306.06024v1 [cs.LG])
    Interpretable time series prediction is crucial for safety-critical areas such as healthcare and autonomous driving. Most existing methods focus on interpreting predictions by assigning important scores to segments of time series. In this paper, we take a different and more challenging route and aim at developing a self-interpretable model, dubbed Counterfactual Time Series (CounTS), which generates counterfactual and actionable explanations for time series predictions. Specifically, we formalize the problem of time series counterfactual explanations, establish associated evaluation protocols, and propose a variational Bayesian deep learning model equipped with counterfactual inference capability of time series abduction, action, and prediction. Compared with state-of-the-art baselines, our self-interpretable model can generate better counterfactual explanations while maintaining comparable prediction accuracy.
    Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects. (arXiv:2306.05963v1 [cs.CV])
    Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either irrelevant (Background-Invariance) or beneficial (Object-Disambiguation), reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt in the other. This underscores the importance of generalizing across both OOD settings, as this ability is crucial for both human cognition and robust AI systems. Next, to better understand the model properties contributing to OOD generalization, we use representational geometry analysis and our own probing methods to examine a population of models, and we discover that those with more factorized representations and appropriate feature weighting are more successful in handling Background-Invariance and Object-Disambiguation tests. We further validate these findings through causal intervention on representation factorization and feature weighting to demonstrate their causal effect on performance. Lastly, we propose new augmentation methods to enhance model generalization. These methods outperform strong baselines, yielding improvements in both in-distribution and OOD tests. In conclusion, to replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations and appropriately weight both kinds of features.  ( 3 min )
    Reconstructing Human Expressiveness in Piano Performances with a Transformer Network. (arXiv:2306.06040v1 [cs.SD])
    Capturing intricate and subtle variations in human expressiveness in music performance using computational approaches is challenging. In this paper, we propose a novel approach for reconstructing human expressiveness in piano performance with a multi-layer bi-directional Transformer encoder. To address the needs for large amounts of accurately captured and score-aligned performance data in training neural networks, we use transcribed scores obtained from an existing transcription model to train our model. We integrate pianist identities to control the sampling process and explore the ability of our system to model variations in expressiveness for different pianists. The system is evaluated through statistical analysis of generated expressive performances and a listening test. Overall, the results suggest that our method achieves state-of-the-art in generating human-like piano performances from transcribed scores, while fully and consistently reconstructing human expressiveness poses further challenges.
    Neural Algorithmic Reasoning for Combinatorial Optimisation. (arXiv:2306.06064v1 [cs.NE])
    Solving NP-hard/complete combinatorial problems with neural networks is a challenging research area that aims to surpass classical approximate algorithms. The long-term objective is to outperform hand-designed heuristics for NP-hard/complete problems by learning to generate superior solutions solely from training data. The Travelling Salesman Problem (TSP) is a prominent combinatorial optimisation problem often targeted by such approaches. However, current neural-based methods for solving TSP often overlook the inherent "algorithmic" nature of the problem. In contrast, heuristics designed for TSP frequently leverage well-established algorithms, such as those for finding the minimum spanning tree. In this paper, we propose leveraging recent advancements in neural algorithmic reasoning to improve the learning of TSP problems. Specifically, we suggest pre-training our neural model on relevant algorithms before training it on TSP instances. Our results demonstrate that, using this learning setup, we achieve superior performance compared to non-algorithmically informed deep learning models.  ( 2 min )
    A Dynamical Graph Prior for Relational Inference. (arXiv:2306.06041v1 [cs.LG])
    Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit a graph neural network (GNN) on a learnable graph to the dynamics. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \textit{effective} interaction graph depends on the sampling rate and it is rarely localized to direct neighbors, leading to local minima for the one-step model. In this work, we propose a \textit{dynamical graph prior} (DYGR) for relational inference. The reason we call it a prior is that, contrary to established practice, it constructively uses error amplification in high-degree non-local polynomial filters to generate good gradients for graph learning. To deal with non-uniqueness, DYGR simultaneously fits a ``shallow'' one-step model with shared graph topology. Experiments show that DYGR reconstructs graphs far more accurately than earlier methods, with remarkable robustness to under-sampling. Since appropriate sampling rates for unknown dynamical systems are not known a priori, this robustness makes DYGR suitable for real applications in scientific machine learning.
    Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering. (arXiv:2306.06083v1 [cs.SD])
    The challenge of fairness arises when Automatic Speech Recognition (ASR) systems do not perform equally well for all sub-groups of the population. In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well. ASR fairness is therefore also a robustness issue. Meanwhile, data privacy also takes priority in production systems. In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. We extract utterance level embeddings using a speaker ID model trained on a public dataset, which we then use in an unsupervised fashion to create acoustic clusters. We use cluster IDs instead of speaker utterance embeddings as extra features during model training, which shows improvements for all demographic groups and in particular for different accents.
    Expectation-Complete Graph Representations with Homomorphisms. (arXiv:2306.05838v1 [cs.LG])
    We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.
    Robust Data-driven Prescriptiveness Optimization. (arXiv:2306.05937v1 [math.OC])
    The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
    Neural FIM for learning Fisher Information Metrics from point cloud data. (arXiv:2306.06062v1 [cs.CV])
    Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the underlying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
    Approximate information state based convergence analysis of recurrent Q-learning. (arXiv:2306.05991v1 [cs.LG])
    In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a complete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.
    Agent market orders representation through a contrastive learning approach. (arXiv:2306.05987v1 [q-fin.ST])
    Due to the access to the labeled orders on the CAC40 data from Euronext, we are able to analyse agents' behaviours in the market based on their placed orders. In this study, we construct a self-supervised learning model using triplet loss to effectively learn the representation of agent market orders. By acquiring this learned representation, various downstream tasks become feasible. In this work, we utilise the K-means clustering algorithm on the learned representation vectors of agent orders to identify distinct behaviour types within each cluster.
    Uncertainty-Aware Bootstrap Learning for Joint Extraction on Distantly-Supervised Data. (arXiv:2305.03827v2 [cs.CL] UPDATED)
    Jointly extracting entity pairs and their relations is challenging when working on distantly-supervised data with ambiguous or noisy labels. To mitigate such impact, we propose uncertainty-aware bootstrap learning, which is motivated by the intuition that the higher uncertainty of an instance, the more likely the model confidence is inconsistent with the ground truths. Specifically, we first explore instance-level data uncertainty to create an initial high-confident examples. Such subset serves as filtering noisy instances and facilitating the model to converge fast at the early stage. During bootstrap learning, we propose self-ensembling as a regularizer to alleviate inter-model uncertainty produced by noisy labels. We further define probability variance of joint tagging probabilities to estimate inner-model parametric uncertainty, which is used to select and build up new reliable training instances for the next iteration. Experimental results on two large datasets reveal that our approach outperforms existing strong baselines and related methods.
    Reflected Diffusion Models. (arXiv:2304.04740v3 [stat.ML] UPDATED)
    Score-based diffusion models learn to reverse a stochastic differential equation that maps data to noise. However, for complex tasks, numerical error can compound and result in highly unnatural samples. Previous work mitigates this drift with thresholding, which projects to the natural data domain (such as pixel space for images) after each diffusion step, but this leads to a mismatch between the training and generative processes. To incorporate data constraints in a principled manner, we present Reflected Diffusion Models, which instead reverse a reflected stochastic differential equation evolving on the support of the data. Our approach learns the perturbed score function through a generalized score matching loss and extends key components of standard diffusion models including diffusion guidance, likelihood-based training, and ODE sampling. We also bridge the theoretical gap with thresholding: such schemes are just discretizations of reflected SDEs. On standard image benchmarks, our method is competitive with or surpasses the state of the art without architectural modifications and, for classifier-free guidance, our approach enables fast exact sampling with ODEs and produces more faithful samples under high guidance weight.
    On the Connection Between MPNN and Graph Transformer. (arXiv:2301.11956v3 [cs.LG] UPDATED)
    Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work (Kim et al., 2022) shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer (Choromanski et al., 2020; Katharopoulos et al., 2020), then MPNN + VN with only O(1) depth and O(1) width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN + VN and DeepSets, we prove the MPNN + VN with O(n^d) width and O(1) depth can approximate the self-attention layer arbitrarily well, where d is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN + VN with O(1) width and O(n) depth approximating the self-attention layer in GT arbitrarily well. On the empirical side, we demonstrate that 1) MPNN + VN is a surprisingly strong baseline, outperforming GT on the recently proposed Long Range Graph Benchmark (LRGB) dataset, 2) our MPNN + VN improves over early implementation on a wide range of OGB datasets and 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task.
    Quartile-Based Seasonality Decomposition for Time Series Forecasting and Anomaly Detection. (arXiv:2306.05989v1 [cs.LG])
    The timely detection of anomalies is essential in the telecom domain as it facilitates the identification and characterization of irregular patterns, abnormal behaviors, and network anomalies, contributing to enhanced service quality and operational efficiency. Precisely forecasting and eliminating predictable time series patterns constitutes a vital component of time series anomaly detection. While the state-of-the-art methods aim to maximize forecasting accuracy, the computational performance takes a hit. In a system composed of a large number of time series variables, e.g., cell Key Performance Indicators (KPIs), the time and space complexity of the forecasting employed is of crucial importance. Quartile-Based Seasonality Decomposition (QBSD) is a live forecasting method proposed in this paper to make an optimal trade-off between computational complexity and forecasting accuracy. This paper compares the performance of QBSD to the state-of-the-art forecasting methods and their applicability to practical anomaly detection. To demonstrate the efficacy of the proposed solution, experimental evaluation was conducted using publicly available datasets as well as a telecom KPI dataset.
    DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles. (arXiv:2306.05957v1 [cs.CV])
    We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation. In comparison to existing slot- or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform ``what-if'' generation -- predict the consequence of changing properties of objects in the initial frames, and DLP's compact structure enables efficient diffusion-based unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web
    Cheating off your neighbors: Improving activity recognition through corroboration. (arXiv:2306.06078v1 [cs.CV])
    Understanding the complexity of human activities solely through an individual's data can be challenging. However, in many situations, surrounding individuals are likely performing similar activities, while existing human activity recognition approaches focus almost exclusively on individual measurements and largely ignore the context of the activity. Consider two activities: attending a small group meeting and working at an office desk. From solely an individual's perspective, it can be difficult to differentiate between these activities as they may appear very similar, even though they are markedly different. Yet, by observing others nearby, it can be possible to distinguish between these activities. In this paper, we propose an approach to enhance the prediction accuracy of an individual's activities by incorporating insights from surrounding individuals. We have collected a real-world dataset from 20 participants with over 58 hours of data including activities such as attending lectures, having meetings, working in the office, and eating together. Compared to observing a single person in isolation, our proposed approach significantly improves accuracy. We regard this work as a first step in collaborative activity recognition, opening new possibilities for understanding human activity in group settings.
    Automating Model Comparison in Factor Graphs. (arXiv:2306.05965v1 [cs.LG])
    Bayesian state and parameter estimation have been automated effectively in the literature, however, this has not yet been the case for model comparison, which therefore still requires error-prone and time-consuming manual derivations. As a result, model comparison is often overlooked and ignored, despite its importance. This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node. Parameter and state inference, and model comparison can then be executed simultaneously using message passing with scale factors. This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.
    Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms. (arXiv:2306.05815v1 [cs.LG])
    The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. This allows to naturally extend KPCA to multiple objective functions and leads to efficient gradient-based algorithms avoiding the expensive SVD of the Gram matrix. Particularly, we consider objective functions that can be written as Moreau envelopes, demonstrating how to promote robustness and sparsity within the same framework. The proposed method is evaluated on synthetic and real-world benchmarks, showing significant speedup in KPCA training time as well as highlighting the benefits in terms of robustness and sparsity.
    Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions. (arXiv:2306.05873v1 [cs.LG])
    Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.
    Robust Reinforcement Learning via Adversarial Kernel Approximation. (arXiv:2306.05859v1 [cs.LG])
    Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, robust reinforcement learning (RL) approaches in RMDPs do not scale well to realistic online settings with high-dimensional domains. By characterizing the adversarial kernel in RMDPs, we propose a novel approach for online robust RL that approximates the adversarial kernel and uses a standard (non-robust) RL algorithm to learn a robust policy. Notably, our approach can be applied on top of any underlying RL algorithm, enabling easy scaling to high-dimensional domains. Experiments in classic control tasks, MinAtar and DeepMind Control Suite demonstrate the effectiveness and the applicability of our method.
    Adaptivity Complexity for Causal Graph Discovery. (arXiv:2306.05781v1 [cs.LG])
    Causal discovery from interventional data is an important problem, where the task is to design an interventional strategy that learns the hidden ground truth causal graph $G(V,E)$ on $|V| = n$ nodes while minimizing the number of performed interventions. Most prior interventional strategies broadly fall into two categories: non-adaptive and adaptive. Non-adaptive strategies decide on a single fixed set of interventions to be performed while adaptive strategies can decide on which nodes to intervene on sequentially based on past interventions. While adaptive algorithms may use exponentially fewer interventions than their non-adaptive counterparts, there are practical concerns that constrain the amount of adaptivity allowed. Motivated by this trade-off, we study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds whilst trying to minimize the total number of interventions. For this problem, we provide a $r$-adaptive algorithm that achieves $O(\min\{r,\log n\} \cdot n^{1/\min\{r,\log n\}})$ approximation with respect to the verification number, a well-known lower bound for adaptive algorithms. Furthermore, for every $r$, we show that our approximation is tight. Our definition of $r$-adaptivity interpolates nicely between the non-adaptive ($r=1$) and fully adaptive ($r=n$) settings where our approximation simplifies to $O(n)$ and $O(\log n)$ respectively, matching the best-known approximation guarantees for both extremes. Our results also extend naturally to the bounded size interventions.
    How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks. (arXiv:2306.05844v1 [cs.CV])
    As the use of collaborative robots (cobots) in industrial manufacturing continues to grow, human action recognition for effective human-robot collaboration becomes increasingly important. This ability is crucial for cobots to act autonomously and assist in assembly tasks. Recently, skeleton-based approaches are often used as they tend to generalize better to different people and environments. However, when processing skeletons alone, information about the objects a human interacts with is lost. Therefore, we present a novel approach of integrating object information into skeleton-based action recognition. We enhance two state-of-the-art methods by treating object centers as further skeleton joints. Our experiments on the assembly dataset IKEA ASM show that our approach improves the performance of these state-of-the-art methods to a large extent when combining skeleton joints with objects predicted by a state-of-the-art instance segmentation model. Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks. We analyze the effect of the object detector on the combination for action classification and discuss the important factors that must be taken into account.
    Speaker Embeddings as Individuality Proxy for Voice Stress Detection. (arXiv:2306.05915v1 [eess.AS])
    Since the mental states of the speaker modulate speech, stress introduced by cognitive or physical loads could be detected in the voice. The existing voice stress detection benchmark has shown that the audio embeddings extracted from the Hybrid BYOL-S self-supervised model perform well. However, the benchmark only evaluates performance separately on each dataset, but does not evaluate performance across the different types of stress and different languages. Moreover, previous studies found strong individual differences in stress susceptibility. This paper presents the design and development of voice stress detection, trained on more than 100 speakers from 9 language groups and five different types of stress. We address individual variabilities in voice stress analysis by adding speaker embeddings to the hybrid BYOL-S features. The proposed method significantly improves voice stress detection performance with an input audio length of only 3-5 seconds.
    Is Normalization Indispensable for Multi-domain Federated Learning?. (arXiv:2306.05879v1 [cs.LG])
    Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, as opposed to label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while alternative normalization techniques possess their own limitations. In order to address these issues, FedWon eliminates all normalizations in FL and reparameterizes convolution layers with scaled weight standardization. Through comprehensive experimentation on four datasets and four models, our results demonstrate that FedWon surpasses both FedAvg and the current state-of-the-art method (FedBN) across all experimental setups, achieving notable improvements of over 10% in certain domains. Furthermore, FedWon is versatile for both cross-silo and cross-device FL, exhibiting strong performance even with a batch size as small as 1, thereby catering to resource-constrained devices. Additionally, FedWon effectively tackles the challenge of skewed label distribution.
    Incorporating Prior Knowledge in Deep Learning Models via Pathway Activity Autoencoders. (arXiv:2306.05813v1 [cs.LG])
    Motivation: Despite advances in the computational analysis of high-throughput molecular profiling assays (e.g. transcriptomics), a dichotomy exists between methods that are simple and interpretable, and ones that are complex but with lower degree of interpretability. Furthermore, very few methods deal with trying to translate interpretability in biologically relevant terms, such as known pathway cascades. Biological pathways reflecting signalling events or metabolic conversions are Small improvements or modifications of existing algorithms will generally not be suitable, unless novel biological results have been predicted and verified. Determining which pathways are implicated in disease and incorporating such pathway data as prior knowledge may enhance predictive modelling and personalised strategies for diagnosis, treatment and prevention of disease. Results: We propose a novel prior-knowledge-based deep auto-encoding framework, PAAE, together with its accompanying generative variant, PAVAE, for RNA-seq data in cancer. Through comprehensive comparisons among various learning models, we show that, despite having access to a smaller set of features, our PAAE and PAVAE models achieve better out-of-set reconstruction results compared to common methodologies. Furthermore, we compare our model with equivalent baselines on a classification task and show that they achieve better results than models which have access to the full input gene set. Another result is that using vanilla variational frameworks might negatively impact both reconstruction outputs as well as classification performance. Finally, our work directly contributes by providing comprehensive interpretability analyses on our models on top of improving prognostication for translational medicine.
    Federated Learning You May Communicate Less Often!. (arXiv:2306.05862v1 [stat.ML])
    We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds between the clients and the parameter server, i.e., the effect on the generalization error of how often the local models as computed by the clients are aggregated at the parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds, say $ R \in \mathbb{N}$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply in their generality for a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and we derive (more) explicit bounds on the generalization error in this case. In particular, we show that the generalization error of FSVM increases with $R$, suggesting that more frequent communication with the parameter server diminishes the generalization power of such learning algorithms. Combined with that the empirical risk generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize in order to minimize the population risk of FL algorithms. Moreover, specialized to the case $R=1$ (sometimes referred to as "one-shot" FL or distributed learning) our bounds suggest that the generalization error of the FL setting decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$, thereby generalizing recent findings in this direction to arbitrary loss functions and algorithms. The results of this paper are also validated on some experiments.
    HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection. (arXiv:2306.05812v1 [eess.AS])
    An individualised head-related transfer function (HRTF) is essential for creating realistic virtual reality (VR) and augmented reality (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for convenient use with a convolutional super-resolution generative adversarial network (SRGAN). This new approach is benchmarked against two baselines: barycentric upsampling and a HRTF selection approach. Experimental results show that the proposed method outperforms both baselines in terms of log-spectral distortion (LSD) and localisation performance using perceptual models when the input HRTF is sparse.
    Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case. (arXiv:2306.05865v1 [cs.LG])
    Recent years have seen a growing interest in accelerating optimization algorithms with machine-learned predictions. Sakaue and Oki (NeurIPS 2022) have developed a general framework that warm-starts the L-convex function minimization method with predictions, revealing the idea's usefulness for various discrete optimization problems. In this paper, we present a framework for using predictions to accelerate M-convex function minimization, thus complementing previous research and extending the range of discrete optimization algorithms that can benefit from predictions. Our framework is particularly effective for an important subclass called laminar convex minimization, which appears in many operations research applications. Our methods can improve time complexity bounds upon the best worst-case results by using predictions and even have potential to go beyond a lower-bound result.
    TreeDQN: Learning to minimize Branch-and-Bound tree. (arXiv:2306.05905v1 [cs.LG])
    Combinatorial optimization problems require an exhaustive search to find the optimal solution. A convenient approach to solving combinatorial optimization tasks in the form of Mixed Integer Linear Programs is Branch-and-Bound. Branch-and-Bound solver splits a task into two parts dividing the domain of an integer variable, then it solves them recursively, producing a tree of nested sub-tasks. The efficiency of the solver depends on the branchning heuristic used to select a variable for splitting. In the present work, we propose a reinforcement learning method that can efficiently learn the branching heuristic. We view the variable selection task as a tree Markov Decision Process, prove that the Bellman operator adapted for the tree Markov Decision Process is contracting in mean, and propose a modified learning objective for the reinforcement learning agent. Our agent requires less training data and produces smaller trees compared to previous reinforcement learning methods.
    2DeteCT -- A large 2D expandable, trainable, experimental Computed Tomography dataset for machine learning. (arXiv:2306.05907v1 [eess.IV])
    Recent research in computational imaging largely focuses on developing machine learning (ML) techniques for image reconstruction, which requires large-scale training datasets consisting of measurement data and ground-truth images. However, suitable experimental datasets for X-ray Computed Tomography (CT) are scarce, and methods are often developed and evaluated only on simulated data. We fill this gap by providing the community with a versatile, open 2D fan-beam CT dataset suitable for developing ML techniques for a range of image reconstruction tasks. To acquire it, we designed a sophisticated, semi-automatic scan procedure that utilizes a highly-flexible laboratory X-ray CT setup. A diverse mix of samples with high natural variability in shape and density was scanned slice-by-slice (5000 slices in total) with high angular and spatial resolution and three different beam characteristics: A high-fidelity, a low-dose and a beam-hardening-inflicted mode. In addition, 750 out-of-distribution slices were scanned with sample and beam variations to accommodate robustness and segmentation tasks. We provide raw projection data, reference reconstructions and segmentations based on an open-source data processing pipeline.
    Can Large Language Models Infer Causation from Correlation?. (arXiv:2306.05836v1 [cs.CL])
    Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). Specifically, we formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We curate a large-scale dataset of more than 400K samples, on which we evaluate seventeen existing LLMs. Through our experiments, we identify a key shortcoming of LLMs in terms of their causal inference skills, and show that these models achieve almost close to random performance on the task. This shortcoming is somewhat mitigated when we try to re-purpose LLMs for this skill via finetuning, but we find that these models still fail to generalize -- they can only perform causal inference in in-distribution settings when variable names and textual expressions used in the queries are similar to those in the training set, but fail in out-of-distribution settings generated by perturbing these queries. Corr2Cause is a challenging task for LLMs, and would be helpful in guiding future research on improving LLMs' pure reasoning skills and generalizability. Our data is at https://huggingface.co/datasets/causalnlp/corr2cause. Our code is at https://github.com/causalNLP/corr2cause.
    C(NN)FD -- a deep learning framework for turbomachinery CFD analysis. (arXiv:2306.05889v1 [cs.LG])
    Deep Learning methods have seen a wide range of successful applications across different industries. Up until now, applications to physical simulations such as CFD (Computational Fluid Dynamics), have been limited to simple test-cases of minor industrial relevance. This paper demonstrates the development of a novel deep learning framework for real-time predictions of the impact of manufacturing and build variations on the overall performance of axial compressors in gas turbines, with a focus on tip clearance variations. The associated scatter in efficiency can significantly increase the $CO_2$ emissions, thus being of great industrial and environmental relevance. The proposed \textit{C(NN)FD} architecture achieves in real-time accuracy comparable to the CFD benchmark. Predicting the flow field and using it to calculate the corresponding overall performance renders the methodology generalisable, while filtering only relevant parts of the CFD solution makes the methodology scalable to industrial applications.
    Fair yet Asymptotically Equal Collaborative Learning. (arXiv:2306.05764v1 [cs.LG])
    In collaborative learning with streaming data, nodes (e.g., organizations) jointly and continuously learn a machine learning (ML) model by sharing the latest model updates computed from their latest streaming data. For the more resourceful nodes to be willing to share their model updates, they need to be fairly incentivized. This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. Our approach leverages an explore-then-exploit formulation to estimate the nodes' contributions (i.e., exploration) for realizing our theoretically guaranteed fair incentives (i.e., exploitation). However, we observe a "rich get richer" phenomenon arising from the existing approaches to guarantee fairness and it discourages the participation of the less resourceful nodes. To remedy this, we additionally preserve asymptotic equality, i.e., less resourceful nodes achieve equal performance eventually to the more resourceful/"rich" nodes. We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.
    Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model. (arXiv:2306.05720v1 [cs.CV])
    Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output.
    DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data. (arXiv:2306.05805v1 [cs.LG])
    Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://anonymous.4open.science/r/code-2022-dynabench/.
    Self-Paced Absolute Learning Progress as a Regularized Approach to Curriculum Learning. (arXiv:2306.05769v1 [cs.LG])
    The usability of Reinforcement Learning is restricted by the large computation times it requires. Curriculum Reinforcement Learning speeds up learning by defining a helpful order in which an agent encounters tasks, i.e. from simple to hard. Curricula based on Absolute Learning Progress (ALP) have proven successful in different environments, but waste computation on repeating already learned behaviour in new tasks. We solve this problem by introducing a new regularization method based on Self-Paced (Deep) Learning, called Self-Paced Absolute Learning Progress (SPALP). We evaluate our method in three different environments. Our method achieves performance comparable to original ALP in all cases, and reaches it quicker than ALP in two of them. We illustrate possibilities to further improve the efficiency and performance of SPALP.
    Quantitative Ink Analysis: Estimating the Number of Inks in Documents through Hyperspectral Imaging. (arXiv:2306.05784v1 [cs.LG])
    In the field of document forensics, ink analysis plays a crucial role in determining the authenticity of legal and historic documents and detecting forgery. Visual examination alone is insufficient for distinguishing visually similar inks, necessitating the use of advanced scientific techniques. This paper proposes an ink analysis technique based on hyperspectral imaging, which enables the examination of documents in hundreds of narrowly spaced spectral bands, revealing hidden details. The main objective of this study is to identify the number of distinct inks used in a document. Three clustering algorithms, namely k-means, Agglomerative, and c-means, are employed to estimate the number of inks present. The methodology involves data extraction, ink pixel segmentation, and ink number determination. The results demonstrate the effectiveness of the proposed technique in identifying ink clusters and distinguishing between different inks. The analysis of a hyperspectral cube dataset reveals variations in spectral reflectance across different bands and distinct spectral responses among the 12 lines, indicating the presence of multiple inks. The clustering algorithms successfully identify ink clusters, with k-means clustering showing superior classification performance. These findings contribute to the development of reliable methodologies for ink analysis using hyperspectral imaging, enhancing the
    Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond. (arXiv:2303.07160v2 [cs.LG] UPDATED)
    We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number $\kappa$. For SGD with Random Reshuffling, we present lower bounds that have tighter $\kappa$ dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of $n$ and $\kappa$ and thereby match the upper bounds shown for a recently proposed algorithm called GraB.  ( 2 min )
    RankFormer: Listwise Learning-to-Rank Using Listwide Labels. (arXiv:2306.05808v1 [cs.IR])
    Web applications where users are presented with a limited selection of items have long employed ranking models to put the most relevant results first. Any feedback received from users is typically assumed to reflect a relative judgement on the utility of items, e.g. a user clicking on an item only implies it is better than items not clicked in the same ranked list. Hence, the objectives optimized in Learning-to-Rank (LTR) tend to be pairwise or listwise. Yet, by only viewing feedback as relative, we neglect the user's absolute feedback on the list's overall quality, e.g. when no items in the selection are clicked. We thus reconsider the standard LTR paradigm and argue the benefits of learning from this listwide signal. To this end, we propose the RankFormer as an architecture that, with a Transformer at its core, can jointly optimize a novel listwide assessment objective and a traditional listwise LTR objective. We simulate implicit feedback on public datasets and observe that the RankFormer succeeds in benefitting from listwide signals. Additionally, we conduct experiments in e-commerce on Amazon Search data and find the RankFormer to be superior to all baselines offline. An online experiment shows that knowledge distillation can be used to find immediate practical use for the RankFormer.
    Transformer-based Time-to-Event Prediction for Chronic Kidney Disease Deterioration. (arXiv:2306.05779v1 [cs.LG])
    Deep-learning techniques, particularly the transformer model, have shown great potential in enhancing the prediction performance of longitudinal health records. While previous methods have mainly focused on fixed-time risk prediction, time-to-event prediction (also known as survival analysis) is often more appropriate for clinical scenarios. Here, we present a novel deep-learning architecture we named STRAFE, a generalizable survival analysis transformer-based architecture for electronic health records. The performance of STRAFE was evaluated using a real-world claim dataset of over 130,000 individuals with stage 3 chronic kidney disease (CKD) and was found to outperform other time-to-event prediction algorithms in predicting the exact time of deterioration to stage 5. Additionally, STRAFE was found to outperform binary outcome algorithms in predicting fixed-time risk, possibly due to its ability to train on censored data. We show that STRAFE predictions can improve the positive predictive value of high-risk patients by 3-fold, demonstrating possible usage to improve targeting for intervention programs. Finally, we suggest a novel visualization approach to predictions on a per-patient basis. In conclusion, STRAFE is a cutting-edge time-to-event prediction algorithm that has the potential to enhance risk predictions in large claims datasets.
    Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees. (arXiv:2306.05739v1 [q-bio.PE])
    Phylogenetics is now fundamental in life sciences, providing insights into the earliest branches of life and the origins and spread of epidemics. However, finding suitable phylogenies from the vast space of possible trees remains challenging. To address this problem, for the first time, we perform both tree exploration and inference in a continuous space where the computation of gradients is possible. This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima. Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases. The approach is effective in cases of empirical data with negligible amounts of data, which we demonstrate on the phylogeny of jawed vertebrates. Indeed, only a few genes with an ultrametric signal were generally sufficient for resolving the major lineages of vertebrate. With cubic-time complexity and efficient optimisation via automatic differentiation, our method presents an effective way forwards for exploring the most difficult, data-deficient phylogenetic questions.
    An End-to-End Reinforcement Learning Approach for Job-Shop Scheduling Problems Based on Constraint Programming. (arXiv:2306.05747v1 [cs.AI])
    Constraint Programming (CP) is a declarative programming paradigm that allows for modeling and solving combinatorial optimization problems, such as the Job-Shop Scheduling Problem (JSSP). While CP solvers manage to find optimal or near-optimal solutions for small instances, they do not scale well to large ones, i.e., they require long computation times or yield low-quality solutions. Therefore, real-world scheduling applications often resort to fast, handcrafted, priority-based dispatching heuristics to find a good initial solution and then refine it using optimization methods. This paper proposes a novel end-to-end approach to solving scheduling problems by means of CP and Reinforcement Learning (RL). In contrast to previous RL methods, tailored for a given problem by including procedural simulation algorithms, complex feature engineering, or handcrafted reward functions, our neural-network architecture and training algorithm merely require a generic CP encoding of some scheduling problem along with a set of small instances. Our approach leverages existing CP solvers to train an agent learning a Priority Dispatching Rule (PDR) that generalizes well to large instances, even from separate datasets. We evaluate our method on seven JSSP datasets from the literature, showing its ability to find higher-quality solutions for very large instances than obtained by static PDRs and by a CP solver within the same time limit.
    Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach. (arXiv:2306.05700v1 [eess.SY])
    The objective of this paper is to investigate the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. Specifically, we establish a finite-time analysis of both the minimax Q-learning algorithm and the corresponding value iteration method. To enhance the analysis of both value iteration and Q-learning, we employ the switching system model of minimax Q-learning and the associated value iteration. This approach provides further insights into minimax Q-learning and facilitates a more straightforward and insightful convergence analysis. We anticipate that the introduction of these additional insights has the potential to uncover novel connections and foster collaboration between concepts in the fields of control theory and reinforcement learning communities.
    Two Independent Teachers are Better Role Model. (arXiv:2306.05745v1 [eess.IV])
    Recent deep learning models have attracted substantial attention in infant brain analysis. These models have performed state-of-the-art performance, such as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher). However, these models depend on an encoder-decoder structure with stacked local operators to gather long-range information, and the local operators limit the efficiency and effectiveness. Besides, the $MRI$ data contain different tissue properties ($TPs$) such as $T1$ and $T2$. One major limitation of these models is that they use both data as inputs to the segment process, i.e., the models are trained on the dataset once, and it requires much computational and memory requirements during inference. In this work, we address the above limitations by designing a new deep-learning model, called 3D-DenseUNet, which works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss. The self-attention module connects the down-sampling blocks to up-sampling blocks, and integrates the feature maps in three dimensions of spatial and channel, effectively improving the representation potential and discriminating ability of the model. Additionally, we propose a new method called Two Independent Teachers ($2IT$), that summarizes the model weights instead of label predictions. Each teacher model is trained on different types of brain data, $T1$ and $T2$, respectively. Then, a fuse model is added to improve test accuracy and enable training with fewer parameters and labels compared to the Temporal Ensembling method without modifying the network architecture. Empirical results demonstrate the effectiveness of the proposed method.
    Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization. (arXiv:2306.05706v1 [cs.LG])
    Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model via stage-wise local training processes on the heterogeneous dataset. Previous works have implicitly studied that FL suffers from the ``client-drift'' problem, which is caused by the inconsistent optimum across local clients. However, till now it still lacks solid theoretical analysis to explain the impact of this local inconsistency. To alleviate the negative impact of the ``client drift'' and explore its substance in FL, in this paper, we first design an efficient FL algorithm \textit{FedInit}, which allows employing the personalized relaxed initialization state at the beginning of each local training stage. Specifically, \textit{FedInit} initializes the local state by moving away from the current global state towards the reverse direction of the latest local state. This relaxed initialization helps to revise the local divergence and enhance the local consistency level. Moreover, to further understand how inconsistency disrupts performance in FL, we introduce the excess risk analysis and study the divergence term to investigate the test error of the proposed \textit{FedInit} method. Our studies show that optimization error is not sensitive to this local inconsistency, while it mainly affects the generalization error bound in \textit{FedInit}. Extensive experiments are conducted to validate this conclusion. Our proposed \textit{FedInit} could achieve state-of-the-art~(SOTA) results compared to several advanced benchmarks without any additional costs. Meanwhile, stage-wise relaxed initialization could also be incorporated into the current advanced algorithms to achieve higher performance in the FL paradigm.
    Weight Re-Mapping for Variational Quantum Algorithms. (arXiv:2306.05776v1 [quant-ph])
    Inspired by the remarkable success of artificial neural networks across a broad spectrum of AI tasks, variational quantum circuits (VQCs) have recently seen an upsurge in quantum machine learning applications. The promising outcomes shown by VQCs, such as improved generalization and reduced parameter training requirements, are attributed to the robust algorithmic capabilities of quantum computing. However, the current gradient-based training approaches for VQCs do not adequately accommodate the fact that trainable parameters (or weights) are typically used as angles in rotational gates. To address this, we extend the concept of weight re-mapping for VQCs, as introduced by K\"olle et al. (2023). This approach unambiguously maps the weights to an interval of length $2\pi$, mirroring data rescaling techniques in conventional machine learning that have proven to be highly beneficial in numerous scenarios. In our study, we employ seven distinct weight re-mapping functions to assess their impact on eight classification datasets, using variational classifiers as a representative example. Our results indicate that weight re-mapping can enhance the convergence speed of the VQC. We assess the efficacy of various re-mapping functions across all datasets and measure their influence on the VQC's average performance. Our findings indicate that weight re-mapping not only consistently accelerates the convergence of VQCs, regardless of the specific re-mapping function employed, but also significantly increases accuracy in certain cases.
    Efficient GNN Explanation via Learning Removal-based Attribution. (arXiv:2306.05760v1 [cs.LG])
    As Graph Neural Networks (GNNs) have been widely used in real-world applications, model explanations are required not only by users but also by legal regulations. However, simultaneously achieving high fidelity and low computational costs in generating explanations has been a challenge for current methods. In this work, we propose a framework of GNN explanation named LeArn Removal-based Attribution (LARA) to address this problem. Specifically, we introduce removal-based attribution and demonstrate its substantiated link to interpretability fidelity theoretically and experimentally. The explainer in LARA learns to generate removal-based attribution which enables providing explanations with high fidelity. A strategy of subgraph sampling is designed in LARA to improve the scalability of the training process. In the deployment, LARA can efficiently generate the explanation through a feed-forward pass. We benchmark our approach with other state-of-the-art GNN explanation methods on six datasets. Results highlight the effectiveness of our framework regarding both efficiency and fidelity. In particular, LARA is 3.5 times faster and achieves higher fidelity than the state-of-the-art method on the large dataset ogbn-arxiv (more than 160K nodes and 1M edges), showing its great potential in real-world applications. Our source code is available at https://anonymous.4open.science/r/LARA-10D8/README.md.
    In-Sample Policy Iteration for Offline Reinforcement Learning. (arXiv:2306.05726v1 [cs.LG])
    Offline reinforcement learning (RL) seeks to derive an effective control policy from previously collected data. To circumvent errors due to inadequate data coverage, behavior-regularized methods optimize the control policy while concurrently minimizing deviation from the data collection policy. Nevertheless, these methods often exhibit subpar practical performance, particularly when the offline dataset is collected by sub-optimal policies. In this paper, we propose a novel algorithm employing in-sample policy iteration that substantially enhances behavior-regularized methods in offline RL. The core insight is that by continuously refining the policy used for behavior regularization, in-sample policy iteration gradually improves itself while implicitly avoids querying out-of-sample actions to avert catastrophic learning failures. Our theoretical analysis verifies its ability to learn the in-sample optimal policy, exclusively utilizing actions well-covered by the dataset. Moreover, we propose competitive policy improvement, a technique applying two competitive policies, both of which are trained by iteratively improving over the best competitor. We show that this simple yet potent technique significantly enhances learning efficiency when function approximation is applied. Lastly, experimental results on the D4RL benchmark indicate that our algorithm outperforms previous state-of-the-art methods in most tasks.
    Advancing Counterfactual Inference through Quantile Regression. (arXiv:2306.05751v1 [cs.LG])
    The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes a structural causal model is available. However, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without a given causal model or even directly estimating conditional distributions. We re-cast counterfactual reasoning as an extended quantile regression problem using neural networks. The approach is statistically more efficient than existing ones, and further makes it possible to develop the generalization ability of the estimated counterfactual outcome to unseen data and provide an upper bound on the generalization error. Experiment results on multiple datasets strongly support our theoretical claims.
    The Role of Diverse Replay for Generalisation in Reinforcement Learning. (arXiv:2306.05727v1 [cs.LG])
    In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environment will improve zero-shot generalisation to new environments/tasks. We motivate mathematically and show empirically that generalisation to states that are "reachable" during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable" states and could be due to improved generalisation of latent representations.
    Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion. (arXiv:2306.05708v1 [cs.SD])
    Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the target and noise to design a diffusion sequence for training, while previously the diffusion path that links the noise and target is a curved segment. When decreasing the number of sampling steps (i.e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations. Secondly, to reduce computational complexity and achieve effective global modeling of noisy speech, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. The patch-wise token leverages Transformer architecture for effective modeling of global information. Adversarial training is used to further improve the sample quality with decreased sampling steps. We test proposed method with speech synthesis conditioned on acoustic feature (Mel-spectrograms). Experimental results verify that our model can synthesize high-quality speech even with only one diffusion step. Both subjective and objective evaluations demonstrate that our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed (3 diffusion steps).
    Explaining Predictive Uncertainty with Information Theoretic Shapley Values. (arXiv:2306.05724v1 [stat.ML])
    Researchers in explainable artificial intelligence have developed numerous methods for helping users understand the predictions of complex supervised learning models. By contrast, explaining the $\textit{uncertainty}$ of model outputs has received relatively little attention. We adapt the popular Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of individual model outputs. We consider games with modified characteristic functions and find deep connections between the resulting Shapley values and fundamental quantities from information theory and conditional independence testing. We outline inference procedures for finite sample error rate control with provable guarantees, and implement an efficient algorithm that performs well in a range of experiments on real and simulated data. Our method has applications to covariate shift detection, active learning, feature selection, and active feature-value acquisition.
    A brief review of contrastive learning applied to astrophysics. (arXiv:2306.05528v1 [astro-ph.IM])
    Reliable tools to extract patterns from high-dimensionality spaces are becoming more necessary as astronomical datasets increase both in volume and complexity. Contrastive Learning is a self-supervised machine learning algorithm that extracts informative measurements from multi-dimensional datasets, which has become increasingly popular in the computer vision and Machine Learning communities in recent years. To do so, it maximizes the agreement between the information extracted from augmented versions of the same input data, making the final representation invariant to the applied transformations. Contrastive Learning is particularly useful in astronomy for removing known instrumental effects and for performing supervised classifications and regressions with a limited amount of available labels, showing a promising avenue towards \emph{Foundation Models}. This short review paper briefly summarizes the main concepts behind contrastive learning and reviews the first promising applications to astronomy. We include some practical recommendations on which applications are particularly attractive for contrastive learning.
    MC-NN: An End-to-End Multi-Channel Neural Network Approach for Predicting Influenza A Virus Hosts and Antigenic Types. (arXiv:2306.05587v1 [cs.LG])
    Influenza poses a significant threat to public health, particularly among the elderly, young children, and people with underlying dis-eases. The manifestation of severe conditions, such as pneumonia, highlights the importance of preventing the spread of influenza. An accurate and cost-effective prediction of the host and antigenic sub-types of influenza A viruses is essential to addressing this issue, particularly in resource-constrained regions. In this study, we propose a multi-channel neural network model to predict the host and antigenic subtypes of influenza A viruses from hemagglutinin and neuraminidase protein sequences. Our model was trained on a comprehensive data set of complete protein sequences and evaluated on various test data sets of complete and incomplete sequences. The results demonstrate the potential and practicality of using multi-channel neural networks in predicting the host and antigenic subtypes of influenza A viruses from both full and partial protein sequences.
    Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY. (arXiv:2306.05649v1 [math.OC])
    We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set. In some simple cases, such problems can be expressed in an analytical form. In general the problem can be made tractable via dualization, which turns a min-max problem into a min-min problem. Dualization requires expertise and is tedious and error-prone. We demonstrate how CVXPY can be used to automate this dualization procedure in a user-friendly manner. Our framework allows practitioners to specify and solve robust ERM problems with a general class of convex losses, capturing many standard regression and classification problems. Users can easily specify any complex uncertainty set that is representable via disciplined convex programming (DCP) constraints.
    Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards. (arXiv:2306.05579v1 [cs.LG])
    We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.
    Deep Learning for Day Forecasts from Sparse Observations. (arXiv:2306.06079v1 [physics.ao-ph])
    Deep neural networks offer an alternative paradigm for modeling weather conditions. The ability of neural models to make a prediction in less than a second once the data is available and to do so with very high temporal and spatial resolution, and the ability to learn directly from atmospheric observations, are just some of these models' unique advantages. Neural models trained using atmospheric observations, the highest fidelity and lowest latency data, have to date achieved good performance only up to twelve hours of lead time when compared with state-of-the-art probabilistic Numerical Weather Prediction models and only for the sole variable of precipitation. In this paper, we present MetNet-3 that extends significantly both the lead time range and the variables that an observation based neural model can predict well. MetNet-3 learns from both dense and sparse data sensors and makes predictions up to 24 hours ahead for precipitation, wind, temperature and dew point. MetNet-3 introduces a key densification technique that implicitly captures data assimilation and produces spatially dense forecasts in spite of the network training on extremely sparse targets. MetNet-3 has a high temporal and spatial resolution of, respectively, up to 2 minutes and 1 km as well as a low operational latency. We find that MetNet-3 is able to outperform the best single- and multi-member NWPs such as HRRR and ENS over the CONUS region for up to 24 hours ahead setting a new performance milestone for observation based neural models.
    Multi-level Cross-modal Feature Alignment via Contrastive Learning towards Zero-shot Classification of Remote Sensing Image Scenes. (arXiv:2306.06066v1 [cs.CV])
    Zero-shot classification of image scenes which can recognize the image scenes that are not seen in the training stage holds great promise of lowering the dependence on large numbers of labeled samples. To address the zero-shot image scene classification, the cross-modal feature alignment methods have been proposed in recent years. These methods mainly focus on matching the visual features of each image scene with their corresponding semantic descriptors in the latent space. Less attention has been paid to the contrastive relationships between different image scenes and different semantic descriptors. In light of the challenge of large intra-class difference and inter-class similarity among image scenes and the potential noisy samples, these methods are susceptible to the influence of the instances which are far from these of the same classes and close to these of other classes. In this work, we propose a multi-level cross-modal feature alignment method via contrastive learning for zero-shot classification of remote sensing image scenes. While promoting the single-instance level positive alignment between each image scene with their corresponding semantic descriptors, the proposed method takes the cross-instance contrastive relationships into consideration,and learns to keep the visual and semantic features of different classes in the latent space apart from each other. Extensive experiments have been done to evaluate the performance of the proposed method. The results show that our proposed method outperforms state of the art methods for zero-shot remote sensing image scene classification. All the code and data are available at github https://github.com/masuqiang/MCFA-Pytorch
    Prediction of Transportation Index for Urban Patterns in Small and Medium-sized Indian Cities using Hybrid RidgeGAN Model. (arXiv:2306.05951v1 [cs.LG])
    The rapid urbanization trend in most developing countries including India is creating a plethora of civic concerns such as loss of green space, degradation of environmental health, clean water availability, air pollution, traffic congestion leading to delays in vehicular transportation, etc. Transportation and network modeling through transportation indices have been widely used to understand transportation problems in the recent past. This necessitates predicting transportation indices to facilitate sustainable urban planning and traffic management. Recent advancements in deep learning research, in particular, Generative Adversarial Networks (GANs), and their modifications in spatial data analysis such as CityGAN, Conditional GAN, and MetroGAN have enabled urban planners to simulate hyper-realistic urban patterns. These synthetic urban universes mimic global urban patterns and evaluating their landscape structures through spatial pattern analysis can aid in comprehending landscape dynamics, thereby enhancing sustainable urban planning. This research addresses several challenges in predicting the urban transportation index for small and medium-sized Indian cities. A hybrid framework based on Kernel Ridge Regression (KRR) and CityGAN is introduced to predict transportation index using spatial indicators of human settlement patterns. This paper establishes a relationship between the transportation index and human settlement indicators and models it using KRR for the selected 503 Indian cities. The proposed hybrid pipeline, we call it RidgeGAN model, can evaluate the sustainability of urban sprawl associated with infrastructure development and transportation systems in sprawling cities. Experimental results show that the two-step pipeline approach outperforms existing benchmarks based on spatial and statistical measures.
    How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?. (arXiv:2306.06048v1 [cs.CV])
    Recent large vision-language models such as CLIP have shown remarkable out-of-distribution (OOD) detection and generalization performance. However, their zero-shot in-distribution (ID) accuracy is often limited for downstream datasets. Recent CLIP-based fine-tuning methods such as prompt learning have demonstrated significant improvements in ID classification and OOD generalization where OOD labels are available. Nonetheless, it remains unclear whether the model is reliable to semantic shifts without OOD labels. In this paper, we aim to bridge the gap and present a comprehensive study to understand how fine-tuning impact OOD detection for few-shot downstream tasks. By framing OOD detection as multi-modal concept matching, we establish a connection between fine-tuning methods and various OOD scores. Our results suggest that a proper choice of OOD scores is essential for CLIP-based fine-tuning. In particular, the maximum concept matching (MCM) score provides a promising solution consistently. We also show that prompt learning demonstrates the state-of-the-art OOD detection performance over the zero-shot counterpart.
    How Sparse Can We Prune A Deep Network: A Geometric Viewpoint. (arXiv:2306.05857v1 [stat.ML])
    Overparameterization constitutes one of the most significant hallmarks of deep neural networks. Though it can offer the advantage of outstanding generalization performance, it meanwhile imposes substantial storage burden, thus necessitating the study of network pruning. A natural and fundamental question is: How sparse can we prune a deep network (with almost no hurt on the performance)? To address this problem, in this work we take a first principles approach, specifically, by merely enforcing the sparsity constraint on the original loss function, we're able to characterize the sharp phase transition point of pruning ratio, which corresponds to the boundary between the feasible and the infeasible, from the perspective of high-dimensional geometry. It turns out that the phase transition point of pruning ratio equals the squared Gaussian width of some convex body resulting from the $l_1$-regularized loss function, normalized by the original dimension of parameters. As a byproduct, we provide a novel network pruning algorithm which is essentially a global one-shot pruning one. Furthermore, we provide efficient countermeasures to address the challenges in computing the involved Gaussian width, including the spectrum estimation of a large-scale Hessian matrix and dealing with the non-definite positiveness of a Hessian matrix. It is demonstrated that the predicted pruning ratio threshold coincides very well with the actual value obtained from the experiments and our proposed pruning algorithm can achieve competitive or even better performance than the existing pruning algorithms. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
    Causality between Sentiment and Cryptocurrency Prices. (arXiv:2306.05803v1 [q-fin.CP])
    This study investigates the relationship between narratives conveyed through microblogging platforms, namely Twitter, and the value of crypto assets. Our study provides a unique technique to build narratives about cryptocurrency by combining topic modelling of short texts with sentiment analysis. First, we used an unsupervised machine learning algorithm to discover the latent topics within the massive and noisy textual data from Twitter, and then we revealed 4-5 cryptocurrency-related narratives, including financial investment, technological advancement related to crypto, financial and political regulations, crypto assets, and media coverage. In a number of situations, we noticed a strong link between our narratives and crypto prices. Our work connects the most recent innovation in economics, Narrative Economics, to a new area of study that combines topic modelling and sentiment analysis to relate consumer behaviour to narratives.
    DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework. (arXiv:2306.05734v1 [cs.LG])
    Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize "adaptive" hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for "adaptive" private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world and synthetic datasets.
    SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning. (arXiv:2305.04501v2 [cs.LG] UPDATED)
    In contrastive learning, the choice of ``view'' controls the information that the representation captures and influences the performance of the model. However, leading graph contrastive learning methods generally produce views via random corruption or learning, which could lead to the loss of essential information and alteration of semantic information. An anchor view that maintains the essential information of input graphs for contrastive learning has been hardly investigated. In this paper, based on the theory of graph information bottleneck, we deduce the definition of this anchor view; put differently, \textit{the anchor view with essential information of input graph is supposed to have the minimal structural uncertainty}. Furthermore, guided by structural entropy, we implement the anchor view, termed \textbf{SEGA}, for graph contrastive learning. We extensively validate the proposed anchor view on various benchmarks regarding graph classification under unsupervised, semi-supervised, and transfer learning and achieve significant performance boosts compared to the state-of-the-art methods.  ( 2 min )
    SLaM: Student-Label Mixing for Distillation with Unlabeled Examples. (arXiv:2302.03806v2 [cs.LG] UPDATED)
    Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates ``soft'' pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment" methods.  ( 2 min )
    10 Security and Privacy Problems in Large Foundation Models. (arXiv:2110.15444v3 [cs.CR] UPDATED)
    Foundation models--such as GPT, CLIP, and DINO--have achieved revolutionary progress in the past several years and are commonly believed to be a promising approach for general-purpose AI. In particular, self-supervised learning is adopted to pre-train a foundation model using a large amount of unlabeled data. A pre-trained foundation model is like an ``operating system'' of the AI ecosystem. Specifically, a foundation model can be used as a feature extractor for many downstream tasks with little or no labeled training data. Existing studies on foundation models mainly focused on pre-training a better foundation model to improve its performance on downstream tasks in non-adversarial settings, leaving its security and privacy in adversarial settings largely unexplored. A security or privacy issue of a pre-trained foundation model leads to a single point of failure for the AI ecosystem. In this book chapter, we discuss 10 basic security and privacy problems for the pre-trained foundation models, including six confidentiality problems, three integrity problems, and one availability problem. For each problem, we discuss potential opportunities and challenges. We hope our book chapter will inspire future research on the security and privacy of foundation models.  ( 2 min )
    Action Matching: Learning Stochastic Dynamics from Samples. (arXiv:2210.06662v3 [cs.LG] UPDATED)
    Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. We derive a tractable training objective, which does not rely on explicit assumptions about the underlying dynamics and does not require back-propagation through differential equations or optimal transport solvers. Inspired by connections with optimal transport, we derive extensions of Action Matching to learn stochastic differential equations and dynamics involving creation and destruction of probability mass. Finally, we showcase applications of Action Matching by achieving competitive performance in a diverse set of experiments from biology, physics, and generative modeling.  ( 2 min )
    Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits. (arXiv:2110.08627v3 [cs.LG] UPDATED)
    We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To this end, we design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by establishing lower bounds on the regret achievable by any algorithm with a given BAI failure probability, we show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives, and (ii) BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI under different values of $\gamma$. Our work elucidates the trade-off more precisely by showing how the constants in previous works depend on certain hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity and the regret on diverse datasets such as MovieLens and Published Kinase Inhibitor Set.  ( 2 min )
    Virtual Node Tuning for Few-shot Node Classification. (arXiv:2306.06063v1 [cs.LG])
    Few-shot Node Classification (FSNC) is a challenge in graph representation learning where only a few labeled nodes per class are available for training. To tackle this issue, meta-learning has been proposed to transfer structural knowledge from base classes with abundant labels to target novel classes. However, existing solutions become ineffective or inapplicable when base classes have no or limited labeled nodes. To address this challenge, we propose an innovative method dubbed Virtual Node Tuning (VNT). Our approach utilizes a pretrained graph transformer as the encoder and injects virtual nodes as soft prompts in the embedding space, which can be optimized with few-shot labels in novel classes to modulate node embeddings for each specific FSNC task. A unique feature of VNT is that, by incorporating a Graph-based Pseudo Prompt Evolution (GPPE) module, VNT-GPPE can handle scenarios with sparse labels in base classes. Experimental results on four datasets demonstrate the superiority of the proposed approach in addressing FSNC with unlabeled or sparsely labeled base classes, outperforming existing state-of-the-art methods and even fully supervised baselines.  ( 2 min )
    DeepStay: Stay Region Extraction from Location Trajectories using Weak Supervision. (arXiv:2306.06068v1 [cs.CV])
    Nowadays, mobile devices enable constant tracking of the user's position and location trajectories can be used to infer personal points of interest (POIs) like homes, workplaces, or stores. A common way to extract POIs is to first identify spatio-temporal regions where a user spends a significant amount of time, known as stay regions (SRs). Common approaches to SR extraction are evaluated either solely unsupervised or on a small-scale private dataset, as popular public datasets are unlabeled. Most of these methods rely on hand-crafted features or thresholds and do not learn beyond hyperparameter optimization. Therefore, we propose a weakly and self-supervised transformer-based model called DeepStay, which is trained on location trajectories to predict stay regions. To the best of our knowledge, this is the first approach based on deep learning and the first approach that is evaluated on a public, labeled dataset. Our SR extraction method outperforms state-of-the-art methods. In addition, we conducted a limited experiment on the task of transportation mode detection from GPS trajectories using the same architecture and achieved significantly higher scores than the state-of-the-art. Our code is available at https://github.com/christianll9/deepstay.  ( 2 min )
    Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum. (arXiv:2305.09943v2 [cs.LG] UPDATED)
    While reinforcement learning (RL) has achieved great success in acquiring complex skills solely from environmental interactions, it assumes that resets to the initial state are readily available at the end of each episode. Such an assumption hinders the autonomous learning of embodied agents due to the time-consuming and cumbersome workarounds for resetting in the physical world. Hence, there has been a growing interest in autonomous RL (ARL) methods that are capable of learning from non-episodic interactions. However, existing works on ARL are limited by their reliance on prior data and are unable to learn in environments where task-relevant interactions are sparse. In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC). With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods, even the ones that leverage demonstrations.  ( 2 min )
    Automatic Change-Point Detection in Time Series via Deep Learning. (arXiv:2211.03860v2 [stat.ML] UPDATED)
    Detecting change-points in data is challenging because of the range of possible types of change and types of behaviour of data when there is no change. Statistically efficient methods for detecting a change will depend on both of these features, and it can be difficult for a practitioner to develop an appropriate detection method for their application of interest. We show how to automatically generate new offline detection methods based on training a neural network. Our approach is motivated by many existing tests for the presence of a change-point being representable by a simple neural network, and thus a neural network trained with sufficient data should have performance at least as good as these methods. We present theory that quantifies the error rate for such an approach, and how it depends on the amount of training data. Empirical results show that, even with limited training data, its performance is competitive with the standard CUSUM-based classifier for detecting a change in mean when the noise is independent and Gaussian, and can substantially outperform it in the presence of auto-correlated or heavy-tailed noise. Our method also shows strong results in detecting and localising changes in activity based on accelerometer data.  ( 2 min )
    clustering an african hairstyle dataset using pca and k-means. (arXiv:2306.06061v1 [cs.CV])
    The adoption of digital transformation was not expressed in building an African face shape classifier. In this paper, an approach is presented that uses k-means to classify African women images. African women rely on beauty standards recommendations, personal preference, or the newest trends in hairstyles to decide on the appropriate hairstyle for them. In this paper, an approach is presented that uses K-means clustering to classify African women's images. In order to identify potential facial clusters, Haarcascade is used for feature-based training, and K-means clustering is applied for image classification.  ( 2 min )
    Deep Laplacian-based Options for Temporally-Extended Exploration. (arXiv:2301.11181v2 [cs.LG] UPDATED)
    Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the graph Laplacian. Importantly, until now these methods have been mostly limited to tabular domains where (1) the graph Laplacian matrix was either given or could be fully estimated, (2) performing eigendecomposition on this matrix was computationally tractable, and (3) value functions could be learned exactly. Additionally, these methods required a separate option discovery phase. These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration. To do so, we introduce a fully online deep RL algorithm for discovering Laplacian-based options and evaluate our approach on a variety of pixel-based tasks. We compare to several state-of-the-art exploration methods and show that our approach is effective, general, and especially promising in non-stationary settings.  ( 2 min )
    Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning. (arXiv:2211.06530v2 [cs.LG] UPDATED)
    We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to our setting. This includes establishing the necessary theory for sensitivity calculations and efficient computation of optimal matrices. For some applications like $>\!\! 10,000$ SGD steps, applying these optimal techniques becomes computationally expensive. We thus design an efficient Fourier-transform-based mechanism with only a minor utility loss. Extensive empirical evaluation on both example-level DP for image classification and user-level DP for language modeling demonstrate substantial improvements over all previous methods, including the widely-used DP-SGD . Though our primary application is to ML, our main DP results are applicable to arbitrary linear queries and hence may have much broader applicability.  ( 2 min )
    An Adaptive Algorithm for Learning with Unknown Distribution Drift. (arXiv:2305.02252v2 [cs.LG] UPDATED)
    We develop and analyze a general technique for learning with an unknown distribution drift. Given a sequence of independent observations from the last $T$ steps of a drifting distribution, our algorithm agnostically learns a family of functions with respect to the current distribution at time $T$. Unlike previous work, our technique does not require prior knowledge about the magnitude of the drift. Instead, the algorithm adapts to the sample data. Without explicitly estimating the drift, the algorithm learns a family of functions with almost the same error as a learning algorithm that knows the magnitude of the drift in advance. Furthermore, since our algorithm adapts to the data, it can guarantee a better learning error than an algorithm that relies on loose bounds on the drift.  ( 2 min )
    Customs Import Declaration Datasets. (arXiv:2208.02484v2 [cs.LG] UPDATED)
    Given the huge volume of cross-border flows, effective and efficient control of trade becomes more crucial in protecting people and society from illicit trade. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declaration dataset to facilitate the collaboration between domain experts in customs administrations and researchers from diverse domains, such as data science and machine learning. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with conditional tabular GAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing the original import data. The fabrication step minimizes the possible identity risk which may exist in trade statistics. Second, the published data follow a similar distribution to the source data so that it can be used in various downstream tasks. Hence, our dataset can be used as a benchmark for testing the performance of any classification algorithm. With the provision of data and its generation process, we open baseline codes for fraud detection tasks, as we empirically show that more advanced algorithms can better detect fraud.  ( 2 min )
    Near-Optimal Algorithms for Private Online Learning in a Stochastic Environment. (arXiv:2102.07929v2 [cs.LG] UPDATED)
    We consider two variants of private stochastic online learning. The first variant is differentially private stochastic bandits. Previously, Sajed and Sheffet (2019) devised the DP Successive Elimination (DP-SE) algorithm that achieves the optimal $ O \biggl(\sum\limits_{1\le j \le K: \Delta_j >0} \frac{ \log T}{ \Delta_j} + \frac{ K\log T}{\epsilon} \biggr)$ problem-dependent regret bound, where $K$ is the number of arms, $\Delta_j$ is the mean reward gap of arm $j$, $T$ is the time horizon, and $\epsilon$ is the required privacy parameter. However, like other elimination style algorithms, it is not an anytime algorithm. Until now, it was not known whether UCB-based algorithms could achieve this optimal regret bound. We present an anytime, UCB-based algorithm that achieves optimality. Our experiments show that the UCB-based algorithm is competitive with DP-SE. The second variant is the full information version of private stochastic online learning. Specifically, for the problem of decision-theoretic online learning with stochastic rewards, we present the first algorithm that achieves an $ O \left( \frac{ \log K}{ \Delta_{\min}} + \frac{\log(K) \min\{\log (\frac{1}{\Delta_{\min}}), \log(T)\}}{\epsilon} \right)$ regret bound, where $\Delta_{\min}$ is the minimum mean reward gap. In addition, we also show an $\Omega \left( \max\left\{ \frac{\log K}{\Delta_{\min}}, \frac{\log K}{\epsilon} \right\} \right)$ problem-dependent lower bound. The key idea behind our good theoretical guarantees in both settings is forgetfulness, i.e., decisions are made based on a certain amount of newly obtained observations instead of all the observations obtained from the very beginning.  ( 3 min )
    Causal Deep Reinforcement Learning Using Observational Data. (arXiv:2211.15355v2 [cs.LG] UPDATED)
    Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.  ( 2 min )
    On the effectiveness of partial variance reduction in federated learning with heterogeneous data. (arXiv:2212.02191v2 [cs.LG] UPDATED)
    Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm.  ( 2 min )
    An Energy-aware and Fault-tolerant Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems. (arXiv:2212.08230v4 [cs.AI] UPDATED)
    Autonomous vehicles are suited for continuous area patrolling problems. However, finding an optimal patrolling strategy can be challenging for many reasons. Firstly, patrolling environments are often complex and can include unknown environmental factors, such as wind or landscape. Secondly, autonomous vehicles can have failures or hardware constraints, such as limited battery life. Importantly, patrolling large areas often requires multiple agents that need to collectively coordinate their actions. In this work, we consider these limitations and propose an approach based on model-free, deep multi-agent reinforcement learning. In this approach, the agents are trained to patrol an environment with various unknown dynamics and factors. They can automatically recharge themselves to support continuous collective patrolling. A distributed homogeneous multi-agent architecture is proposed, where all patrolling agents execute identical policies locally based on their local observations and shared location information. This architecture provides a patrolling system that can tolerate agent failures and allow supplementary agents to be added to replace failed agents or to increase the overall patrol performance. The solution is validated through simulation experiments from multiple perspectives, including the overall patrol performance, the efficiency of battery recharging strategies, the overall fault tolerance, and the ability to cooperate with supplementary agents.  ( 2 min )
    A prediction and behavioural analysis of machine learning methods for modelling travel mode choice. (arXiv:2301.04404v2 [cs.LG] UPDATED)
    The emergence of a variety of Machine Learning (ML) approaches for travel mode choice prediction poses an interesting question to transport modellers: which models should be used for which applications? The answer to this question goes beyond simple predictive performance, and is instead a balance of many factors, including behavioural interpretability and explainability, computational complexity, and data efficiency. There is a growing body of research which attempts to compare the predictive performance of different ML classifiers with classical random utility models. However, existing studies typically analyse only the disaggregate predictive performance, ignoring other aspects affecting model choice. Furthermore, many studies are affected by technical limitations, such as the use of inappropriate validation schemes, incorrect sampling for hierarchical data, lack of external validation, and the exclusive use of discrete metrics. We address these limitations by conducting a systematic comparison of different modelling approaches, across multiple modelling problems, in terms of the key factors likely to affect model choice (out-of-sample predictive performance, accuracy of predicted market shares, extraction of behavioural indicators, and computational efficiency). We combine several real world datasets with synthetic datasets, where the data generation function is known. The results indicate that the models with the highest disaggregate predictive performance (namely extreme gradient boosting and random forests) provide poorer estimates of behavioural indicators and aggregate mode shares, and are more expensive to estimate, than other models, including deep neural networks and Multinomial Logit (MNL). It is further observed that the MNL model performs robustly in a variety of situations, though ML techniques can improve the estimates of behavioural indices such as Willingness to Pay.  ( 3 min )
    Reformulating van Rijsbergen's $F_{\beta}$ metric for weighted binary cross-entropy. (arXiv:2210.16458v2 [stat.ML] UPDATED)
    The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The goal is to guide model performance and interpretation by assuming statistical distributions on this performance metric for dynamic weighting. The focus is on van Rijsbergens $F_{\beta}$ metric -- a popular choice for gauging classification performance. Through distributional assumptions on the $F_{\beta}$, an intermediary link can be established to the standard binary cross-entropy via dynamic penalty weights. First, the $F_{\beta}$ metric is reformulated to facilitate assuming statistical distributions with accompanying proofs for the cumulative density function. These probabilities are used within a knee curve algorithm to find an optimal $\beta$ or $\beta_{opt}$. This $\beta_{opt}$ is used as a weight or penalty in the proposed weighted binary cross-entropy. Experimentation on publicly available data with imbalanced classes mostly yields better and interpretable results as compared to the baseline. For example, for the IMDB text data with known labeling errors, a 14% boost is shown. This methodology can provide better interpretation.  ( 2 min )
    FLSTRA: Federated Learning in Stratosphere. (arXiv:2302.00163v3 [cs.NI] UPDATED)
    We propose a federated learning (FL) in stratosphere (FLSTRA) system, where a high altitude platform station (HAPS) facilitates a large number of terrestrial clients to collaboratively learn a global model without sharing the training data. FLSTRA overcomes the challenges faced by FL in terrestrial networks, such as slow convergence and high communication delay due to limited client participation and multi-hop communications. HAPS leverages its altitude and size to allow the participation of more clients with line-of-sight (LOS) links and the placement of a powerful server. However, handling many clients at once introduces computing and transmission delays. Thus, we aim to obtain a delay-accuracy trade-off for FLSTRA. Specifically, we first develop a joint client selection and resource allocation algorithm for uplink and downlink to minimize the FL delay subject to the energy and quality-of-service (QoS) constraints. Second, we propose a communication and computation resource-aware (CCRA-FL) algorithm to achieve the target FL accuracy while deriving an upper bound for its convergence rate. The formulated problem is non-convex; thus, we propose an iterative algorithm to solve it. Simulation results demonstrate the effectiveness of the proposed FLSTRA system, compared to terrestrial benchmarks, in terms of FL delay and accuracy.  ( 2 min )
    Time-Warping Invariant Quantum Recurrent Neural Networks via Quantum-Classical Adaptive Gating. (arXiv:2301.08173v3 [quant-ph] UPDATED)
    Adaptive gating plays a key role in temporal data processing via classical recurrent neural networks (RNN), as it facilitates retention of past information necessary to predict the future, providing a mechanism that preserves invariance to time warping transformations. This paper builds on quantum recurrent neural networks (QRNNs), a dynamic model with quantum memory, to introduce a novel class of temporal data processing quantum models that preserve invariance to time-warping transformations of the (classical) input-output sequences. The model, referred to as time warping-invariant QRNN (TWI-QRNN), augments a QRNN with a quantum-classical adaptive gating mechanism that chooses whether to apply a parameterized unitary transformation at each time step as a function of the past samples of the input sequence via a classical recurrent model. The TWI-QRNN model class is derived from first principles, and its capacity to successfully implement time-warping transformations is experimentally demonstrated on examples with classical or quantum dynamics.  ( 2 min )
    Doubly Smoothed GDA for Constrained Nonconvex-Nonconcave Minimax Optimization. (arXiv:2212.12978v4 [math.OC] UPDATED)
    Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Unfortunately, most existing algorithms cannot be guaranteed to converge globally and even suffer from limit cycles. To address this issue, we propose a novel single-loop algorithm called doubly smoothed gradient descent ascent method (DSGDA), which naturally balances the primal and dual updates. The proposed DSGDA can get rid of limit cycles in various challenging nonconvex-nonconcave examples in the literature, including Forsaken, Bilinearly-coupled minimax, Sixth-order polynomial, and PolarGame. We further show that under an one-sided Kurdyka-\L{}ojasiewicz condition with exponent $\theta\in(0,1)$ (resp. convex primal/concave dual function), DSGDA can find a game-stationary point with an iteration complexity of $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ (resp. $\mathcal{O}(\epsilon^{-4})$). These match the best results for single-loop algorithms that solve nonconvex-concave or convex-nonconcave minimax problems, or problems satisfying the rather restrictive one-sided Polyak-\L{}ojasiewicz condition. Our work demonstrates, for the first time, the possibility of having a simple and unified single-loop algorithm for solving nonconvex-nonconcave, nonconvex-concave, and convex-nonconcave minimax problems.  ( 2 min )
    Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections. (arXiv:2205.05359v2 [stat.ML] UPDATED)
    The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.  ( 2 min )
    CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis. (arXiv:2301.01642v2 [stat.ML] UPDATED)
    There is a recent trend to leverage the power of graph neural networks (GNNs) for brain-network based psychiatric diagnosis, which,in turn, also motivates an urgent need for psychiatrists to fully understand the decision behavior of the used GNNs. However, most of the existing GNN explainers are either post-hoc in which another interpretive model needs to be created to explain a well-trained GNN, or do not consider the causal relationship between the extracted explanation and the decision, such that the explanation itself contains spurious correlations and suffers from weak faithfulness. In this work, we propose a granger causality-inspired graph neural network (CI-GNN), a built-in interpretable model that is able to identify the most influential subgraph (i.e., functional connectivity within brain regions) that is causally related to the decision (e.g., major depressive disorder patients or healthy controls), without the training of an auxillary interpretive network. CI-GNN learns disentangled subgraph-level representations {\alpha} and \b{eta} that encode, respectively, the causal and noncausal aspects of original graph under a graph variational autoencoder framework, regularized by a conditional mutual information (CMI) constraint. We theoretically justify the validity of the CMI regulation in capturing the causal relationship. We also empirically evaluate the performance of CI-GNN against three baseline GNNs and four state-of-the-art GNN explainers on synthetic data and three large-scale brain disease datasets. We observe that CI-GNN achieves the best performance in a wide range of metrics and provides more reliable and concise explanations which have clinical evidence.  ( 3 min )
    Almost Surely $\sqrt{T}$ Regret Bound for Adaptive LQR. (arXiv:2301.05537v3 [math.OC] UPDATED)
    The Linear-Quadratic Regulation (LQR) problem with unknown system parameters has been widely studied, but it has remained unclear whether $\tilde{ \mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can be achieved almost surely. In this paper, we propose an adaptive LQR controller with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The controller features a circuit-breaking mechanism, which circumvents potential safety breach and guarantees the convergence of the system parameter estimate, but is shown to be triggered only finitely often and hence has negligible effect on the asymptotic performance of the controller. The proposed controller is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly used industrial process example.  ( 2 min )
    Adaptive Estimation of Graphical Models under Total Positivity. (arXiv:2210.15471v2 [stat.ML] UPDATED)
    We consider the problem of estimating (diagonally dominant) M-matrices as precision matrices in Gaussian graphical models. These models exhibit intriguing properties, such as the existence of the maximum likelihood estimator with merely two observations for M-matrices \citep{lauritzen2019maximum,slawski2015estimation} and even one observation for diagonally dominant M-matrices \citep{truell2021maximum}. We propose an adaptive multiple-stage estimation method that refines the estimate by solving a weighted $\ell_1$-regularized problem at each stage. Furthermore, we develop a unified framework based on the gradient projection method to solve the regularized problem, incorporating distinct projections to handle the constraints of M-matrices and diagonally dominant M-matrices. A theoretical analysis of the estimation error is provided. Our method outperforms state-of-the-art methods in precision matrix estimation and graph edge identification, as evidenced by synthetic and financial time-series data sets.  ( 2 min )
    Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games. (arXiv:2212.14449v2 [math.OC] UPDATED)
    Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the $N$-player game. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for $N$-agent games, proving sample complexity guarantees by only using a sample path from the $N$-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.  ( 2 min )
    Huber-energy measure quantization. (arXiv:2212.08162v2 [stat.ML] UPDATED)
    We describe a measure quantization procedure i.e., an algorithm which finds the best approximation of a target probability law (and more generally signed finite variation measure) by a sum of $Q$ Dirac masses ($Q$ being the quantization parameter). The procedure is implemented by minimizing the statistical distance between the original measure and its quantized version; the distance is built from a negative definite kernel and, if necessary, can be computed on the fly and feed to a stochastic optimization algorithm (such as SGD, Adam, ...). We investigate theoretically the fundamental questions of existence of the optimal measure quantizer and identify what are the required kernel properties that guarantee suitable behavior. We propose two best linear unbiased (BLUE) estimators for the squared statistical distance and use them in an unbiased procedure, called HEMQ, to find the optimal quantization. We test HEMQ on several databases: multi-dimensional Gaussian mixtures, Wiener space cubature, Italian wine cultivars and the MNIST image database. The results indicate that the HEMQ algorithm is robust and versatile and, for the class of Huber-energy kernels, matches the expected intuitive behavior.  ( 2 min )
    Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective. (arXiv:2208.07365v2 [cs.CV] UPDATED)
    Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art methods. The code with reproducible results is publicly accessible.  ( 2 min )
    Conformal Credal Self-Supervised Learning. (arXiv:2205.15239v2 [stat.ML] UPDATED)
    In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art performance. However, pseudo-labels typically stem from ad-hoc heuristics, relying on the quality of the predictions though without guaranteeing their validity. One such method, so-called credal self-supervised learning, maintains pseudo-supervision in the form of sets of (instead of single) probability distributions over labels, thereby allowing for a flexible yet uncertainty-aware labeling. Again, however, there is no justification beyond empirical effectiveness. To address this deficiency, we make use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions. As a result, the construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data. Along with this, we present effective algorithms for learning from credal self-supervision. An empirical study demonstrates excellent calibration properties of the pseudo-supervision, as well as the competitiveness of our method on several benchmark datasets.  ( 2 min )
  • Open

    Agent market orders representation through a contrastive learning approach. (arXiv:2306.05987v1 [q-fin.ST])
    Due to the access to the labeled orders on the CAC40 data from Euronext, we are able to analyse agents' behaviours in the market based on their placed orders. In this study, we construct a self-supervised learning model using triplet loss to effectively learn the representation of agent market orders. By acquiring this learned representation, various downstream tasks become feasible. In this work, we utilise the K-means clustering algorithm on the learned representation vectors of agent orders to identify distinct behaviour types within each cluster.
    Out-of-Variable Generalization for Discriminative Models. (arXiv:2304.07896v2 [cs.LG] UPDATED)
    The ability of an agent to do well in new environments is a critical aspect of intelligence. In machine learning, this ability is known as $\textit{strong}$ or $\textit{out-of-distribution}$ generalization. However, merely considering differences in data distributions is inadequate for fully capturing differences between learning environments. In the present paper, we investigate $\textit{out-of-variable}$ generalization, which pertains to an agent's generalization capabilities concerning environments with variables that were never jointly observed before. This skill closely reflects the process of animate learning: we, too, explore Nature by probing, observing, and measuring $\textit{subsets}$ of variables at any given time. Mathematically, $\textit{out-of-variable}$ generalization requires the efficient re-use of past marginal information, i.e., information over subsets of previously observed variables. We study this problem, focusing on prediction tasks across environments that contain overlapping, yet distinct, sets of causes. We show that after fitting a classifier, the residual distribution in one environment reveals the partial derivative of the true generating function with respect to the unobserved causal parent in that environment. We leverage this information and propose a method that exhibits non-trivial out-of-variable generalization performance when facing an overlapping, yet distinct, set of causal predictors.
    Huber-energy measure quantization. (arXiv:2212.08162v2 [stat.ML] UPDATED)
    We describe a measure quantization procedure i.e., an algorithm which finds the best approximation of a target probability law (and more generally signed finite variation measure) by a sum of $Q$ Dirac masses ($Q$ being the quantization parameter). The procedure is implemented by minimizing the statistical distance between the original measure and its quantized version; the distance is built from a negative definite kernel and, if necessary, can be computed on the fly and feed to a stochastic optimization algorithm (such as SGD, Adam, ...). We investigate theoretically the fundamental questions of existence of the optimal measure quantizer and identify what are the required kernel properties that guarantee suitable behavior. We propose two best linear unbiased (BLUE) estimators for the squared statistical distance and use them in an unbiased procedure, called HEMQ, to find the optimal quantization. We test HEMQ on several databases: multi-dimensional Gaussian mixtures, Wiener space cubature, Italian wine cultivars and the MNIST image database. The results indicate that the HEMQ algorithm is robust and versatile and, for the class of Huber-energy kernels, matches the expected intuitive behavior.
    Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions. (arXiv:2305.16255v1 [stat.AP] CROSS LISTED)
    Aggregated curves are common structures in economics and finance, and the most prominent examples are supply and demand curves. In this study, we exploit the fact that all aggregated curves have an intrinsic hierarchical structure, and thus hierarchical reconciliation methods can be used to improve the forecast accuracy. We provide an in-depth theory on how aggregated curves can be constructed or deconstructed, and conclude that these methods are equivalent under weak assumptions. We consider multiple reconciliation methods for aggregated curves, including previously established bottom-up, top-down, and linear optimal reconciliation approaches. We also present a new benchmark reconciliation method called 'aggregated-down' with similar complexity to bottom-up and top-down approaches, but it tends to provide better accuracy in this setup. We conducted an empirical forecasting study on the German day-ahead power auction market by predicting the demand and supply curves, where their equilibrium determines the electricity price for the next day. Our results demonstrate that hierarchical reconciliation methods can be used to improve the forecasting accuracy of aggregated curves.
    Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections. (arXiv:2205.05359v2 [stat.ML] UPDATED)
    The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.
    Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond. (arXiv:2303.07160v2 [cs.LG] UPDATED)
    We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number $\kappa$. For SGD with Random Reshuffling, we present lower bounds that have tighter $\kappa$ dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of $n$ and $\kappa$ and thereby match the upper bounds shown for a recently proposed algorithm called GraB.
    Reflected Diffusion Models. (arXiv:2304.04740v3 [stat.ML] UPDATED)
    Score-based diffusion models learn to reverse a stochastic differential equation that maps data to noise. However, for complex tasks, numerical error can compound and result in highly unnatural samples. Previous work mitigates this drift with thresholding, which projects to the natural data domain (such as pixel space for images) after each diffusion step, but this leads to a mismatch between the training and generative processes. To incorporate data constraints in a principled manner, we present Reflected Diffusion Models, which instead reverse a reflected stochastic differential equation evolving on the support of the data. Our approach learns the perturbed score function through a generalized score matching loss and extends key components of standard diffusion models including diffusion guidance, likelihood-based training, and ODE sampling. We also bridge the theoretical gap with thresholding: such schemes are just discretizations of reflected SDEs. On standard image benchmarks, our method is competitive with or surpasses the state of the art without architectural modifications and, for classifier-free guidance, our approach enables fast exact sampling with ODEs and produces more faithful samples under high guidance weight.
    Adaptivity Complexity for Causal Graph Discovery. (arXiv:2306.05781v1 [cs.LG])
    Causal discovery from interventional data is an important problem, where the task is to design an interventional strategy that learns the hidden ground truth causal graph $G(V,E)$ on $|V| = n$ nodes while minimizing the number of performed interventions. Most prior interventional strategies broadly fall into two categories: non-adaptive and adaptive. Non-adaptive strategies decide on a single fixed set of interventions to be performed while adaptive strategies can decide on which nodes to intervene on sequentially based on past interventions. While adaptive algorithms may use exponentially fewer interventions than their non-adaptive counterparts, there are practical concerns that constrain the amount of adaptivity allowed. Motivated by this trade-off, we study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds whilst trying to minimize the total number of interventions. For this problem, we provide a $r$-adaptive algorithm that achieves $O(\min\{r,\log n\} \cdot n^{1/\min\{r,\log n\}})$ approximation with respect to the verification number, a well-known lower bound for adaptive algorithms. Furthermore, for every $r$, we show that our approximation is tight. Our definition of $r$-adaptivity interpolates nicely between the non-adaptive ($r=1$) and fully adaptive ($r=n$) settings where our approximation simplifies to $O(n)$ and $O(\log n)$ respectively, matching the best-known approximation guarantees for both extremes. Our results also extend naturally to the bounded size interventions.
    Data-Adaptive Probabilistic Likelihood Approximation for Ordinary Differential Equations. (arXiv:2306.05566v1 [stat.ML])
    Parameter inference for ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODE solutions are typically approximated by deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their parameter values. This produces deep local minima in the likelihood function -- a problem which existing probabilistic solvers have yet to resolve. Here, we show that a Bayesian filtering paradigm for probabilistic ODE solution can dramatically reduce sensitivity to parameters by learning from the noisy ODE observations in a data-adaptive manner. Our method is applicable to ODEs with partially unobserved components and with arbitrary non-Gaussian noise. Several examples demonstrate that it is more accurate than existing probabilistic ODE solvers, and even in some cases than the exact ODE likelihood.
    A Dynamical Graph Prior for Relational Inference. (arXiv:2306.06041v1 [cs.LG])
    Relational inference aims to identify interactions between parts of a dynamical system from the observed dynamics. Current state-of-the-art methods fit a graph neural network (GNN) on a learnable graph to the dynamics. They use one-step message-passing GNNs -- intuitively the right choice since non-locality of multi-step or spectral GNNs may confuse direct and indirect interactions. But the \textit{effective} interaction graph depends on the sampling rate and it is rarely localized to direct neighbors, leading to local minima for the one-step model. In this work, we propose a \textit{dynamical graph prior} (DYGR) for relational inference. The reason we call it a prior is that, contrary to established practice, it constructively uses error amplification in high-degree non-local polynomial filters to generate good gradients for graph learning. To deal with non-uniqueness, DYGR simultaneously fits a ``shallow'' one-step model with shared graph topology. Experiments show that DYGR reconstructs graphs far more accurately than earlier methods, with remarkable robustness to under-sampling. Since appropriate sampling rates for unknown dynamical systems are not known a priori, this robustness makes DYGR suitable for real applications in scientific machine learning.
    Domain-Agnostic Batch Bayesian Optimization with Diverse Constraints via Bayesian Quadrature. (arXiv:2306.05843v1 [cs.LG])
    Real-world optimisation problems often feature complex combinations of (1) diverse constraints, (2) discrete and mixed spaces, and are (3) highly parallelisable. (4) There are also cases where the objective function cannot be queried if unknown constraints are not satisfied, e.g. in drug discovery, safety on animal experiments (unknown constraints) must be established before human clinical trials (querying objective function) may proceed. However, most existing works target each of the above three problems in isolation and do not consider (4) unknown constraints with query rejection. For problems with diverse constraints and/or unconventional input spaces, it is difficult to apply these techniques as they are often mutually incompatible. We propose cSOBER, a domain-agnostic prudent parallel active sampler for Bayesian optimisation, based on SOBER of Adachi et al. (2023). We consider infeasibility under unknown constraints as a type of integration error that we can estimate. We propose a theoretically-driven approach that propagates such error as a tolerance in the quadrature precision that automatically balances exploitation and exploration with the expected rejection rate. Moreover, our method flexibly accommodates diverse constraints and/or discrete and mixed spaces via adaptive tolerance, including conventional zero-risk cases. We show that cSOBER outperforms competitive baselines on diverse real-world blackbox-constrained problems, including safety-constrained drug discovery, and human-relationship-aware team optimisation over graph-structured space.
    Federated Learning You May Communicate Less Often!. (arXiv:2306.05862v1 [stat.ML])
    We investigate the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, we study the evolution of the generalization error with the number of communication rounds between the clients and the parameter server, i.e., the effect on the generalization error of how often the local models as computed by the clients are aggregated at the parameter server. We establish PAC-Bayes and rate-distortion theoretic bounds on the generalization error that account explicitly for the effect of the number of rounds, say $ R \in \mathbb{N}$, in addition to the number of participating devices $K$ and individual datasets size $n$. The bounds, which apply in their generality for a large class of loss functions and learning algorithms, appear to be the first of their kind for the FL setting. Furthermore, we apply our bounds to FL-type Support Vector Machines (FSVM); and we derive (more) explicit bounds on the generalization error in this case. In particular, we show that the generalization error of FSVM increases with $R$, suggesting that more frequent communication with the parameter server diminishes the generalization power of such learning algorithms. Combined with that the empirical risk generally decreases for larger values of $R$, this indicates that $R$ might be a parameter to optimize in order to minimize the population risk of FL algorithms. Moreover, specialized to the case $R=1$ (sometimes referred to as "one-shot" FL or distributed learning) our bounds suggest that the generalization error of the FL setting decreases faster than that of centralized learning by a factor of $\mathcal{O}(\sqrt{\log(K)/K})$, thereby generalizing recent findings in this direction to arbitrary loss functions and algorithms. The results of this paper are also validated on some experiments.
    Asymptotically efficient one-step stochastic gradient descent. (arXiv:2306.05896v1 [math.ST])
    A generic, fast and asymptotically efficient method for parametric estimation is described. It is based on the stochastic gradient descent on the loglikelihood function corrected by a single step of the Fisher scoring algorithm. We show theoretically and by simulations in the i.i.d. setting that it is an interesting alternative to the usual stochastic gradient descent with averaging or the adaptative stochastic gradient descent.  ( 2 min )
    Conformal Credal Self-Supervised Learning. (arXiv:2205.15239v2 [stat.ML] UPDATED)
    In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art performance. However, pseudo-labels typically stem from ad-hoc heuristics, relying on the quality of the predictions though without guaranteeing their validity. One such method, so-called credal self-supervised learning, maintains pseudo-supervision in the form of sets of (instead of single) probability distributions over labels, thereby allowing for a flexible yet uncertainty-aware labeling. Again, however, there is no justification beyond empirical effectiveness. To address this deficiency, we make use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions. As a result, the construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data. Along with this, we present effective algorithms for learning from credal self-supervision. An empirical study demonstrates excellent calibration properties of the pseudo-supervision, as well as the competitiveness of our method on several benchmark datasets.  ( 2 min )
    L0Learn: A Scalable Package for Sparse Learning using L0 Regularization. (arXiv:2202.04820v2 [cs.LG] UPDATED)
    We present L0Learn: an open-source package for sparse linear regression and classification using $\ell_0$ regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has user-friendly R and Python interfaces. L0Learn can address problems with millions of features, achieving competitive run times and statistical performance with state-of-the-art sparse learning packages. L0Learn is available on both CRAN and GitHub (https://cran.r-project.org/package=L0Learn and https://github.com/hazimehh/L0Learn).  ( 2 min )
    Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning. (arXiv:2211.06530v2 [cs.LG] UPDATED)
    We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to our setting. This includes establishing the necessary theory for sensitivity calculations and efficient computation of optimal matrices. For some applications like $>\!\! 10,000$ SGD steps, applying these optimal techniques becomes computationally expensive. We thus design an efficient Fourier-transform-based mechanism with only a minor utility loss. Extensive empirical evaluation on both example-level DP for image classification and user-level DP for language modeling demonstrate substantial improvements over all previous methods, including the widely-used DP-SGD . Though our primary application is to ML, our main DP results are applicable to arbitrary linear queries and hence may have much broader applicability.  ( 2 min )
    Adaptive Estimation of Graphical Models under Total Positivity. (arXiv:2210.15471v2 [stat.ML] UPDATED)
    We consider the problem of estimating (diagonally dominant) M-matrices as precision matrices in Gaussian graphical models. These models exhibit intriguing properties, such as the existence of the maximum likelihood estimator with merely two observations for M-matrices \citep{lauritzen2019maximum,slawski2015estimation} and even one observation for diagonally dominant M-matrices \citep{truell2021maximum}. We propose an adaptive multiple-stage estimation method that refines the estimate by solving a weighted $\ell_1$-regularized problem at each stage. Furthermore, we develop a unified framework based on the gradient projection method to solve the regularized problem, incorporating distinct projections to handle the constraints of M-matrices and diagonally dominant M-matrices. A theoretical analysis of the estimation error is provided. Our method outperforms state-of-the-art methods in precision matrix estimation and graph edge identification, as evidenced by synthetic and financial time-series data sets.  ( 2 min )
    MonoFlow: Rethinking Divergence GANs via the Perspective of Wasserstein Gradient Flows. (arXiv:2302.01075v3 [stat.ML] UPDATED)
    The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework - MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., non-saturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.  ( 2 min )
    Deterministic equivalent of the Conjugate Kernel matrix associated to Artificial Neural Networks. (arXiv:2306.05850v1 [math.PR])
    We study the Conjugate Kernel associated to a multi-layer linear-width feed-forward neural network with random weights, biases and data. We show that the empirical spectral distribution of the Conjugate Kernel converges to a deterministic limit. More precisely we obtain a deterministic equivalent for its Stieltjes transform and its resolvent, with quantitative bounds involving both the dimension and the spectral parameter. The limiting equivalent objects are described by iterating free convolution of measures and classical matrix operations involving the parameters of the model.  ( 2 min )
    Monte Carlo inference for semiparametric Bayesian regression. (arXiv:2306.05498v1 [stat.ME])
    Data transformations are essential for broad applicability of parametric regression models. However, for Bayesian analysis, joint inference of the transformation and model parameters typically involves restrictive parametric transformations or nonparametric representations that are computationally inefficient and cumbersome for implementation and theoretical analysis, which limits their usability in practice. This paper introduces a simple, general, and efficient strategy for joint posterior inference of an unknown transformation and all regression model parameters. The proposed approach directly targets the posterior distribution of the transformation by linking it with the marginal distributions of the independent and dependent variables, and then deploys a Bayesian nonparametric model via the Bayesian bootstrap. Crucially, this approach delivers (1) joint posterior consistency under general conditions, including multiple model misspecifications, and (2) efficient Monte Carlo (not Markov chain Monte Carlo) inference for the transformation and all parameters for important special cases. These tools apply across a variety of data domains, including real-valued, integer-valued, compactly-supported, and positive data. Simulation studies and an empirical application demonstrate the effectiveness and efficiency of this strategy for semiparametric Bayesian analysis with linear models, quantile regression, and Gaussian processes.  ( 2 min )
    Maximally Machine-Learnable Portfolios. (arXiv:2306.05568v1 [econ.EM])
    When it comes to stock returns, any form of predictability can bolster risk-adjusted profitability. We develop a collaborative machine learning algorithm that optimizes portfolio weights so that the resulting synthetic security is maximally predictable. Precisely, we introduce MACE, a multivariate extension of Alternating Conditional Expectations that achieves the aforementioned goal by wielding a Random Forest on one side of the equation, and a constrained Ridge Regression on the other. There are two key improvements with respect to Lo and MacKinlay's original maximally predictable portfolio approach. First, it accommodates for any (nonlinear) forecasting algorithm and predictor set. Second, it handles large portfolios. We conduct exercises at the daily and monthly frequency and report significant increases in predictability and profitability using very little conditioning information. Interestingly, predictability is found in bad as well as good times, and MACE successfully navigates the debacle of 2022.  ( 2 min )
    Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits. (arXiv:2110.08627v3 [cs.LG] UPDATED)
    We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To this end, we design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by establishing lower bounds on the regret achievable by any algorithm with a given BAI failure probability, we show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives, and (ii) BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI under different values of $\gamma$. Our work elucidates the trade-off more precisely by showing how the constants in previous works depend on certain hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity and the regret on diverse datasets such as MovieLens and Published Kinase Inhibitor Set.  ( 2 min )
    Adaptive Conditional Quantile Neural Processes. (arXiv:2305.18777v2 [cs.LG] UPDATED)
    Neural processes are a family of probabilistic models that inherit the flexibility of neural networks to parameterize stochastic processes. Despite providing well-calibrated predictions, especially in regression problems, and quick adaptation to new tasks, the Gaussian assumption that is commonly used to represent the predictive likelihood fails to capture more complicated distributions such as multimodal ones. To overcome this limitation, we propose Conditional Quantile Neural Processes (CQNPs), a new member of the neural processes family, which exploits the attractive properties of quantile regression in modeling the distributions irrespective of their form. By introducing an extension of quantile regression where the model learns to focus on estimating informative quantiles, we show that the sampling efficiency and prediction accuracy can be further enhanced. Our experiments with real and synthetic datasets demonstrate substantial improvements in predictive performance compared to the baselines, and better modeling of heterogeneous distributions' characteristics such as multimodality.  ( 2 min )
    Double-Weighting for Covariate Shift Adaptation. (arXiv:2305.08637v3 [stat.ML] UPDATED)
    Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $\mathrm{p}_\text{tr}(x)$ and $\mathrm{p}_\text{te}(x)$ are different but the label conditionals coincide. Existing approaches address such covariate shift by either using the ratio $\mathrm{p}_\text{te}(x)/\mathrm{p}_\text{tr}(x)$ to weight training samples (reweighted methods) or using the ratio $\mathrm{p}_\text{tr}(x)/\mathrm{p}_\text{te}(x)$ to weight testing samples (robust methods). However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. In addition, we develop effective techniques that obtain both sets of weights and generalize the conventional kernel mean matching method. We provide novel generalization bounds for our method that show a significant increase in the effective sample size compared with reweighted methods. The proposed method also achieves enhanced classification performance in both synthetic and empirical experiments.  ( 2 min )
    Causal Deep Reinforcement Learning Using Observational Data. (arXiv:2211.15355v2 [cs.LG] UPDATED)
    Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.  ( 2 min )
    Doubly Smoothed GDA for Constrained Nonconvex-Nonconcave Minimax Optimization. (arXiv:2212.12978v4 [math.OC] UPDATED)
    Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Unfortunately, most existing algorithms cannot be guaranteed to converge globally and even suffer from limit cycles. To address this issue, we propose a novel single-loop algorithm called doubly smoothed gradient descent ascent method (DSGDA), which naturally balances the primal and dual updates. The proposed DSGDA can get rid of limit cycles in various challenging nonconvex-nonconcave examples in the literature, including Forsaken, Bilinearly-coupled minimax, Sixth-order polynomial, and PolarGame. We further show that under an one-sided Kurdyka-\L{}ojasiewicz condition with exponent $\theta\in(0,1)$ (resp. convex primal/concave dual function), DSGDA can find a game-stationary point with an iteration complexity of $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ (resp. $\mathcal{O}(\epsilon^{-4})$). These match the best results for single-loop algorithms that solve nonconvex-concave or convex-nonconcave minimax problems, or problems satisfying the rather restrictive one-sided Polyak-\L{}ojasiewicz condition. Our work demonstrates, for the first time, the possibility of having a simple and unified single-loop algorithm for solving nonconvex-nonconcave, nonconvex-concave, and convex-nonconcave minimax problems.  ( 2 min )
    Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions. (arXiv:2306.05873v1 [cs.LG])
    Learning in MDPs with highly complex state representations is currently possible due to multiple advancements in reinforcement learning algorithm design. However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations. Furthermore, our technique is computationally efficient, and does not depend on the methods used to produce the worst-case directions. We conduct extensive experiments in the Arcade Learning Environment with several different adversarial attack techniques. Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.  ( 2 min )
    Automatic Change-Point Detection in Time Series via Deep Learning. (arXiv:2211.03860v2 [stat.ML] UPDATED)
    Detecting change-points in data is challenging because of the range of possible types of change and types of behaviour of data when there is no change. Statistically efficient methods for detecting a change will depend on both of these features, and it can be difficult for a practitioner to develop an appropriate detection method for their application of interest. We show how to automatically generate new offline detection methods based on training a neural network. Our approach is motivated by many existing tests for the presence of a change-point being representable by a simple neural network, and thus a neural network trained with sufficient data should have performance at least as good as these methods. We present theory that quantifies the error rate for such an approach, and how it depends on the amount of training data. Empirical results show that, even with limited training data, its performance is competitive with the standard CUSUM-based classifier for detecting a change in mean when the noise is independent and Gaussian, and can substantially outperform it in the presence of auto-correlated or heavy-tailed noise. Our method also shows strong results in detecting and localising changes in activity based on accelerometer data.  ( 2 min )
    Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games. (arXiv:2212.14449v2 [math.OC] UPDATED)
    Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the $N$-player game. Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for $N$-agent games, proving sample complexity guarantees by only using a sample path from the $N$-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by $N$ agents with finite sample guarantees.  ( 2 min )
    How Sparse Can We Prune A Deep Network: A Geometric Viewpoint. (arXiv:2306.05857v1 [stat.ML])
    Overparameterization constitutes one of the most significant hallmarks of deep neural networks. Though it can offer the advantage of outstanding generalization performance, it meanwhile imposes substantial storage burden, thus necessitating the study of network pruning. A natural and fundamental question is: How sparse can we prune a deep network (with almost no hurt on the performance)? To address this problem, in this work we take a first principles approach, specifically, by merely enforcing the sparsity constraint on the original loss function, we're able to characterize the sharp phase transition point of pruning ratio, which corresponds to the boundary between the feasible and the infeasible, from the perspective of high-dimensional geometry. It turns out that the phase transition point of pruning ratio equals the squared Gaussian width of some convex body resulting from the $l_1$-regularized loss function, normalized by the original dimension of parameters. As a byproduct, we provide a novel network pruning algorithm which is essentially a global one-shot pruning one. Furthermore, we provide efficient countermeasures to address the challenges in computing the involved Gaussian width, including the spectrum estimation of a large-scale Hessian matrix and dealing with the non-definite positiveness of a Hessian matrix. It is demonstrated that the predicted pruning ratio threshold coincides very well with the actual value obtained from the experiments and our proposed pruning algorithm can achieve competitive or even better performance than the existing pruning algorithms. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning  ( 2 min )
    Using Image Transformations to Learn Network Structure. (arXiv:2112.03419v2 [stat.ML] UPDATED)
    Many learning tasks require observing a sequence of images and making a decision. In a transportation problem of designing and planning for shipping boxes between nodes, we show how to treat the network of nodes and the flows between them as images. These images have useful structural information that can be statistically summarized. Using image compression techniques, we reduce an image down to a set of numbers that contain interpretable geographic information that we call geographic signatures. Using geographic signatures, we learn network structure that can be utilized to recommend future network connectivity. We develop a Bayesian reinforcement algorithm that takes advantage of statistically summarized network information as priors and user-decisions to reinforce an agent's probabilistic decision. Additionally, we show how reinforcement learning can be used with compression directly without interpretation in simple tasks.  ( 2 min )
    Reformulating van Rijsbergen's $F_{\beta}$ metric for weighted binary cross-entropy. (arXiv:2210.16458v2 [stat.ML] UPDATED)
    The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss functions to inform training outcomes. The goal is to guide model performance and interpretation by assuming statistical distributions on this performance metric for dynamic weighting. The focus is on van Rijsbergens $F_{\beta}$ metric -- a popular choice for gauging classification performance. Through distributional assumptions on the $F_{\beta}$, an intermediary link can be established to the standard binary cross-entropy via dynamic penalty weights. First, the $F_{\beta}$ metric is reformulated to facilitate assuming statistical distributions with accompanying proofs for the cumulative density function. These probabilities are used within a knee curve algorithm to find an optimal $\beta$ or $\beta_{opt}$. This $\beta_{opt}$ is used as a weight or penalty in the proposed weighted binary cross-entropy. Experimentation on publicly available data with imbalanced classes mostly yields better and interpretable results as compared to the baseline. For example, for the IMDB text data with known labeling errors, a 14% boost is shown. This methodology can provide better interpretation.
    Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards. (arXiv:2306.05579v1 [cs.LG])
    We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.  ( 2 min )
    PFNs4BO: In-Context Learning for Bayesian Optimization. (arXiv:2305.17535v3 [cs.LG] UPDATED)
    In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). PFNs are neural processes that are trained to approximate the posterior predictive distribution (PPD) through in-context learning on any prior distribution that can be efficiently sampled from. We describe how this flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network (BNN). In addition, we show how to incorporate further information into the prior, such as allowing hints about the position of optima (user priors), ignoring irrelevant dimensions, and performing non-myopic BO by learning the acquisition function. The flexibility underlying these extensions opens up vast possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for BO in a large-scale evaluation on artificial GP samples and three different hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish code alongside trained models at https://github.com/automl/PFNs4BO.  ( 2 min )
    Active Learning with Weak Supervision for Gaussian Processes. (arXiv:2204.08335v2 [stat.ML] UPDATED)
    Annotating data for supervised learning can be costly. When the annotation budget is limited, active learning can be used to select and annotate those observations that are likely to give the most gain in model performance. We propose an active learning algorithm that, in addition to selecting which observation to annotate, selects the precision of the annotation that is acquired. Assuming that annotations with low precision are cheaper to obtain, this allows the model to explore a larger part of the input space, with the same annotation budget. We build our acquisition function on the previously proposed BALD objective for Gaussian Processes, and empirically demonstrate the gains of being able to adjust the annotation precision in the active learning loop.  ( 2 min )
    Efficient Learning for Selecting Top-m Context-Dependent Designs. (arXiv:2305.04086v2 [stat.ML] UPDATED)
    We consider a simulation optimization problem for a context-dependent decision-making, which aims to determine the top-m designs for all contexts. Under a Bayesian framework, we formulate the optimal dynamic sampling decision as a stochastic dynamic programming problem, and develop a sequential sampling policy to efficiently learn the performance of each design under each context. The asymptotically optimal sampling ratios are derived to attain the optimal large deviations rate of the worst-case of probability of false selection. The proposed sampling policy is proved to be consistent and its asymptotic sampling ratios are asymptotically optimal. Numerical experiments demonstrate that the proposed method improves the efficiency for selection of top-m context-dependent designs.  ( 2 min )
    Differentially Private Image Classification by Learning Priors from Random Processes. (arXiv:2306.06076v1 [cs.CV])
    In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving the performance of DP-SGD on private data by incorporating priors that are learned on real-world public data. In this work, we explore how we can improve the privacy-utility tradeoff of DP-SGD by learning priors from images generated by random processes and transferring these priors to private data. We propose DP-RandP, a three-phase approach. We attain new state-of-the-art accuracy when training from scratch on CIFAR10, CIFAR100, and MedMNIST for a range of privacy budgets $\varepsilon \in [1, 8]$. In particular, we improve the previous best reported accuracy on CIFAR10 from $60.6 \%$ to $72.3 \%$ for $\varepsilon=1$. Our code is available at https://github.com/inspire-group/DP-RandP.  ( 2 min )
    Explaining Predictive Uncertainty with Information Theoretic Shapley Values. (arXiv:2306.05724v1 [stat.ML])
    Researchers in explainable artificial intelligence have developed numerous methods for helping users understand the predictions of complex supervised learning models. By contrast, explaining the $\textit{uncertainty}$ of model outputs has received relatively little attention. We adapt the popular Shapley value framework to explain various types of predictive uncertainty, quantifying each feature's contribution to the conditional entropy of individual model outputs. We consider games with modified characteristic functions and find deep connections between the resulting Shapley values and fundamental quantities from information theory and conditional independence testing. We outline inference procedures for finite sample error rate control with provable guarantees, and implement an efficient algorithm that performs well in a range of experiments on real and simulated data. Our method has applications to covariate shift detection, active learning, feature selection, and active feature-value acquisition.  ( 2 min )
    Automating Model Comparison in Factor Graphs. (arXiv:2306.05965v1 [cs.LG])
    Bayesian state and parameter estimation have been automated effectively in the literature, however, this has not yet been the case for model comparison, which therefore still requires error-prone and time-consuming manual derivations. As a result, model comparison is often overlooked and ignored, despite its importance. This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node. Parameter and state inference, and model comparison can then be executed simultaneously using message passing with scale factors. This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.  ( 2 min )
    Path Neural Networks: Expressive and Accurate Graph Neural Networks. (arXiv:2306.05955v1 [cs.LG])
    Graph neural networks (GNNs) have recently become the standard approach for learning with graph-structured data. Prior work has shed light into their potential, but also their limitations. Unfortunately, it was shown that standard GNNs are limited in their expressive power. These models are no more powerful than the 1-dimensional Weisfeiler-Leman (1-WL) algorithm in terms of distinguishing non-isomorphic graphs. In this paper, we propose Path Neural Networks (PathNNs), a model that updates node representations by aggregating paths emanating from nodes. We derive three different variants of the PathNN model that aggregate single shortest paths, all shortest paths and all simple paths of length up to K. We prove that two of these variants are strictly more powerful than the 1-WL algorithm, and we experimentally validate our theoretical results. We find that PathNNs can distinguish pairs of non-isomorphic graphs that are indistinguishable by 1-WL, while our most expressive PathNN variant can even distinguish between 3-WL indistinguishable graphs. The different PathNN variants are also evaluated on graph classification and graph regression datasets, where in most cases, they outperform the baseline methods.  ( 2 min )
    Learning with symmetric positive definite matrices via generalized Bures-Wasserstein geometry. (arXiv:2110.10464v2 [math.FA] UPDATED)
    Learning with symmetric positive definite (SPD) matrices has many applications in machine learning. Consequently, understanding the Riemannian geometry of SPD matrices has attracted much attention lately. A particular Riemannian geometry of interest is the recently proposed Bures-Wasserstein (BW) geometry which builds on the Wasserstein distance between the Gaussian densities. In this paper, we propose a novel generalization of the BW geometry, which we call the GBW geometry. The proposed generalization is parameterized by a symmetric positive definite matrix $\mathbf{M}$ such that when $\mathbf{M} = \mathbf{I}$, we recover the BW geometry. We provide a rigorous treatment to study various differential geometric notions on the proposed novel generalized geometry which makes it amenable to various machine learning applications. We also present experiments that illustrate the efficacy of the proposed GBW geometry over the BW geometry.  ( 2 min )
    Debiasing Conditional Stochastic Optimization. (arXiv:2304.10613v2 [cs.LG] UPDATED)
    In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity to reach convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.  ( 2 min )
    CI-GNN: A Granger Causality-Inspired Graph Neural Network for Interpretable Brain Network-Based Psychiatric Diagnosis. (arXiv:2301.01642v2 [stat.ML] UPDATED)
    There is a recent trend to leverage the power of graph neural networks (GNNs) for brain-network based psychiatric diagnosis, which,in turn, also motivates an urgent need for psychiatrists to fully understand the decision behavior of the used GNNs. However, most of the existing GNN explainers are either post-hoc in which another interpretive model needs to be created to explain a well-trained GNN, or do not consider the causal relationship between the extracted explanation and the decision, such that the explanation itself contains spurious correlations and suffers from weak faithfulness. In this work, we propose a granger causality-inspired graph neural network (CI-GNN), a built-in interpretable model that is able to identify the most influential subgraph (i.e., functional connectivity within brain regions) that is causally related to the decision (e.g., major depressive disorder patients or healthy controls), without the training of an auxillary interpretive network. CI-GNN learns disentangled subgraph-level representations {\alpha} and \b{eta} that encode, respectively, the causal and noncausal aspects of original graph under a graph variational autoencoder framework, regularized by a conditional mutual information (CMI) constraint. We theoretically justify the validity of the CMI regulation in capturing the causal relationship. We also empirically evaluate the performance of CI-GNN against three baseline GNNs and four state-of-the-art GNN explainers on synthetic data and three large-scale brain disease datasets. We observe that CI-GNN achieves the best performance in a wide range of metrics and provides more reliable and concise explanations which have clinical evidence.  ( 3 min )
    Differentially Private Optimization for Smooth Nonconvex ERM. (arXiv:2302.04972v2 [cs.LG] UPDATED)
    We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM. We use line search, mini-batching, and a two-phase strategy to improve the speed and practicality of the algorithm. Numerical experiments demonstrate the effectiveness of these approaches.  ( 2 min )
    Extending Kernel PCA through Dualization: Sparsity, Robustness and Fast Algorithms. (arXiv:2306.05815v1 [cs.LG])
    The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. This allows to naturally extend KPCA to multiple objective functions and leads to efficient gradient-based algorithms avoiding the expensive SVD of the Gram matrix. Particularly, we consider objective functions that can be written as Moreau envelopes, demonstrating how to promote robustness and sparsity within the same framework. The proposed method is evaluated on synthetic and real-world benchmarks, showing significant speedup in KPCA training time as well as highlighting the benefits in terms of robustness and sparsity.  ( 2 min )
    Improving Estimation of the Koopman Operator with Kolmogorov-Smirnov Indicator Functions. (arXiv:2306.05945v1 [physics.data-an])
    It has become common to perform kinetic analysis using approximate Koopman operators that transforms high-dimensional time series of observables into ranked dynamical modes. Key to a practical success of the approach is the identification of a set of observables which form a good basis in which to expand the slow relaxation modes. Good observables are, however, difficult to identify {\em a priori} and sub-optimal choices can lead to significant underestimations of characteristic timescales. Leveraging the representation of slow dynamics in terms of Hidden Markov Model (HMM), we propose a simple and computationally efficient clustering procedure to infer surrogate observables that form a good basis for slow modes. We apply the approach to an analytically solvable model system, as well as on three protein systems of different complexities. We consistently demonstrate that the inferred indicator functions can significantly improve the estimation of the leading eigenvalues of the Koopman operators and correctly identify key states and transition timescales of stochastic systems, even when good observables are not known {\em a priori}.  ( 2 min )
    Estimation of Ridge Using Nonlinear Transformation on Density Function. (arXiv:2306.05722v1 [cs.LG])
    Ridges play a vital role in accurately approximating the underlying structure of manifolds. In this paper, we explore the ridge's variation by applying a concave nonlinear transformation to the density function. Through the derivation of the Hessian matrix, we observe that nonlinear transformations yield a rank-one modification of the Hessian matrix. Leveraging the variational properties of eigenvalue problems, we establish a partial order inclusion relationship among the corresponding ridges. We intuitively discover that the transformation can lead to improved estimation of the tangent space via rank-one modification of the Hessian matrix. To validate our theories, we conduct extensive numerical experiments on synthetic and real-world datasets that demonstrate the superiority of the ridges obtained from our transformed approach in approximating the underlying truth manifold compared to other manifold fitting algorithms.  ( 2 min )
    Optimal Variable Clustering for High-Dimensional Matrix Valued Data. (arXiv:2112.12909v2 [stat.ML] UPDATED)
    Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.  ( 3 min )
    Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit. (arXiv:2306.05998v1 [cs.LG])
    We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown change points. The agents face the identical piecewise-stationary MAB problem. The goal is to develop a decision-making policy for the agents that minimizes the regret, which is the expected total loss of not playing the optimal arm at each time step. Our proposed solution, Restarted Bayesian Online Change Point Detection in Cooperative Upper Confidence Bound Algorithm (RBO-Coop-UCB), involves an efficient multi-agent UCB algorithm as its core enhanced with a Bayesian change point detector. We also develop a simple restart decision cooperation that improves decision-making. Theoretically, we establish that the expected group regret of RBO-Coop-UCB is upper bounded by $\mathcal{O}(KNM\log T + K\sqrt{MT\log T})$, where K is the number of agents, M is the number of arms, and T is the number of time steps. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed method outperforms the state-of-the-art algorithms.  ( 2 min )
    Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks. (arXiv:2306.05674v1 [stat.ML])
    Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{quantify}, and \emph{remove}, the procedural uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled data set, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads.  ( 2 min )
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v1 [cs.LG])
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.  ( 2 min )
    Quartile-Based Seasonality Decomposition for Time Series Forecasting and Anomaly Detection. (arXiv:2306.05989v1 [cs.LG])
    The timely detection of anomalies is essential in the telecom domain as it facilitates the identification and characterization of irregular patterns, abnormal behaviors, and network anomalies, contributing to enhanced service quality and operational efficiency. Precisely forecasting and eliminating predictable time series patterns constitutes a vital component of time series anomaly detection. While the state-of-the-art methods aim to maximize forecasting accuracy, the computational performance takes a hit. In a system composed of a large number of time series variables, e.g., cell Key Performance Indicators (KPIs), the time and space complexity of the forecasting employed is of crucial importance. Quartile-Based Seasonality Decomposition (QBSD) is a live forecasting method proposed in this paper to make an optimal trade-off between computational complexity and forecasting accuracy. This paper compares the performance of QBSD to the state-of-the-art forecasting methods and their applicability to practical anomaly detection. To demonstrate the efficacy of the proposed solution, experimental evaluation was conducted using publicly available datasets as well as a telecom KPI dataset.  ( 2 min )
    Bayes optimal learning in high-dimensional linear regression with network side information. (arXiv:2306.05679v1 [math.ST])
    Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.  ( 2 min )
    Boosting with Tempered Exponential Measures. (arXiv:2306.05487v1 [cs.LG])
    One of the most popular ML algorithms, AdaBoost, can be derived from the dual of a relative entropy minimization problem subject to the fact that the positive weights on the examples sum to one. Essentially, harder examples receive higher probabilities. We generalize this setup to the recently introduced {\it tempered exponential measure}s (TEMs) where normalization is enforced on a specific power of the measure and not the measure itself. TEMs are indexed by a parameter $t$ and generalize exponential families ($t=1$). Our algorithm, $t$-AdaBoost, recovers AdaBoost~as a special case ($t=1$). We show that $t$-AdaBoost retains AdaBoost's celebrated exponential convergence rate when $t\in [0,1)$ while allowing a slight improvement of the rate's hidden constant compared to $t=1$. $t$-AdaBoost partially computes on a generalization of classical arithmetic over the reals and brings notable properties like guaranteed bounded leveraging coefficients for $t\in [0,1)$. From the loss that $t$-AdaBoost minimizes (a generalization of the exponential loss), we show how to derive a new family of {\it tempered} losses for the induction of domain-partitioning classifiers like decision trees. Crucially, strict properness is ensured for all while their boosting rates span the full known spectrum. Experiments using $t$-AdaBoost+trees display that significant leverage can be achieved by tuning $t$.  ( 2 min )
    Task-specific experimental design for treatment effect estimation. (arXiv:2306.05484v1 [stat.ME])
    Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. In this work, we develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications. Across a range of important tasks, real-world datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks.  ( 2 min )

  • Open

    Elvis behaving badly
    submitted by /u/Only-Control5926 [link] [comments]  ( 7 min )
    Ai is getting scary advanced.
    submitted by /u/Cucumberjoes [link] [comments]  ( 7 min )
    Request for Help: Code Generative AI vs Data Generative AI
    I have a large warehouse database that contains over 1k tables. I want to be able to use AI to generate SQL queries, SProcs and functions based on text prompt like we do with Chat GPT. I could use Chat GPT but there are so many limitations not in the way that I get answers but in the amount of data (tokens) that I can provide and receive before the AI loses the context of my database tables and schema. I want a system that can learn my database tables and take that into consideration every time I ask specific questions. I can provide as much information as possible to the AI (tables, columns, possible values...) to get me as close as it can to the final result. I found a few machine learning systems like MindsDB, but they all work with data prediction through AI tables and are not focused on the DDL and DML to generate code. If you have any thoughts on this, please help and share :). Thank you. submitted by /u/atryeba [link] [comments]  ( 8 min )
    Urgent question about janitor ai
    so I just started using janitor ai and I need to know if it uses unlimited chats or not, and if it doesn't do they refresh kinda like Chai? submitted by /u/Acrobatic-Bowler-556 [link] [comments]  ( 8 min )
    Hmm
    submitted by /u/thunderstonetopikas [link] [comments]  ( 8 min )
    Artificial intelligence solving bureaucratic bottleneck?
    Is this a sensitive and loaded subject? Well, I know that there are good reasons why bureaucracy is meant to be slow and fast bureaucracy means autocracy, usually. And the worst aspect of fast bureaucracy is corruption, greed, and human ego. Will AI somehow make well oiled bureaucracy possible without corruption and misuse of power? submitted by /u/Absolute-Nobody0079 [link] [comments]  ( 8 min )
    Looking for AI to clean up old video
    Has anyone seen an AI tool that will clean up old home video (color correct, grain removal, upscale, fix lines etc) from old vhs or 8mm. Has anyone seen any tool like that, or somewhere I could start and modify to my needs? submitted by /u/lmaccaro [link] [comments]  ( 8 min )
    Viva Las Vegas with everything AI (disclaimer: the music is really by the King)
    submitted by /u/Only-Control5926 [link] [comments]  ( 8 min )
  • Open

    [P] Improve AI 8.0: Free Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions
    Improve AI 8.0 - Contextual Multi-Armed Bandit Platform for Scoring, Ranking & Decisions Full announcement post at: https://improve.ai/2023/06/08/contextual-bandit.html We’re thrilled to introduce Improve AI 8.0, a modern, free, production-ready contextual multi-armed bandit platform that quickly scores and ranks items using intuitive reward-based training. Multi-armed bandits and contextual bandits are corner-stone machine learning algorithms that power a myriad of applications including recommendation systems, personalization, query re-ranking, automated decisions, and multi-variate optimization. With version 8, we’ve fully delivered on our original vision - providing a high performance, simple to use, low cost contextual multi-armed bandit platform. Key features of v8.0 include: …  ( 10 min )
    REINFORCE changing the objective[D]
    Consider the Reinforce algorithm, where we would like to maximize the Expected reward. There we just apply the log trick and things work out smoothly. Let us, instead, say I want to minimize the difference between my Expected reward and some constant c . For, instance, something like minimize (E(R) - C) ^2. The hope is to get rewards close to C on average. How do I frame the objective, directly compute E(R) over n samples or is there some more details i need to worry about? My initial idea is to just compute the MSE loss and do back prop. Bonus: How do I now reduce the variance of this estimate, what baseline do I use? The same as in the original REINFORCE algorithm? If so why, just a rough intuition. Also what would you suggest, use some sort of value function for the baseline? Thanks. p.s: Mainly interested in framing the objective submitted by /u/ashblue21 [link] [comments]  ( 8 min )
    [D] Training a Large Acoustic Model Similarly To A Large Language Model?
    I'm very interested in acoustic embeddings. The state of the art with text embeddings is based on the success of "large language models" (big models, transformer/attention, self-supervision, and vast data). Is there a state of the art equivalent "large acoustic model" that is not restricted to speech? There certainly are domain specific acoustic successes. Whisper for speech, Merlin for birds, and humpback whale calls on tensorflow-hub. I'm wondering if there is a similar foundational approach of a LLM for acoustics, a "Large Acoustic Model" that is trained on a wide set of sounds, in a self supervised way so that it can successfully embed (nearly) arbitrary audio for downstream tasks. Right now I can understand my wife talking, a lawn mower droning in the background, an occasional car driving by, and several distinct birds chirping/squawking. That audio stream contains broadband acoustic noise, and transient content at different time scales. My brain deals with it. It seems reasonable that a LAM should be able to embed all that audio audio appropriately. Suppose you had TB of data that covers all those domains (environmental, biological, mechanical, speech, music, transients, etc). How would you train a foundational "Large Acoustic Model" to produce appropriate embeddings? submitted by /u/Simusid [link] [comments]  ( 8 min )
    [P] New book explaining more advanced concepts in machine learning, deep learning, and AI
    After working on it for more than a year and countless weekends, my new book "Machine Learning Q and AI" is now finally complete. Lots of people who read my previous books ("Python Machine Learning" and "Machine Learning with PyTorch and Scikit-Learn") reached out to me, asking for more advanced follow-up material. Also, I received a lot of nice feedback on my blog and social media threads, where I often explain various ML-related concepts. So, I thought to combine the two and cover 30 more advanced concepts in this book. The topics include things like Explanations of multi-GPU training paradigms. Using and finetuning transformers. Differences between encoder- and decoder-style LLMs. ... (The explanation are conceptual, without code examples and mathematical equations; however, I include several code examples in the supplementary materials.) PS: A paperback version will also follow later this summer. submitted by /u/seraschka [link] [comments]  ( 8 min )
    [Project] treebomination: convert a scikit-learn decision tree into a Keras model
    submitted by /u/Dobias [link] [comments]  ( 8 min )
    [R] [ICASSP 2023] Towards Improved Room Impulse Response Estimation for Speech Recognition
    submitted by /u/Snoo63916 [link] [comments]  ( 8 min )
    [N] Google's SGE is a Plagiarism Engine That Could Break the Internet
    submitted by /u/geekinchief [link] [comments]  ( 8 min )
    [R] Kindly read my research article on establishing material proeprties of graphene using machine learning interatomic potentials.
    https://authors.elsevier.com/c/1hCgV3In-uzEuf Free link to the paper posted above. I hope you guys will like it. submitted by /u/Outrageous-Art3649 [link] [comments]  ( 8 min )
    [D] Is excess risk decomposition & bias variance tradeoff talking about the same thing?
    Is Excess risk decomposition and bias variance tradeoff talking about the same thing? I came across this beautiful math of decomposition of risk into approximation error & estimation error in one of the courses on statistical learning theory. It somehow got me thinking about the bias variance tradeoff that we casually talk about in machine learning. Any light on this? submitted by /u/Vishesh1597 [link] [comments]  ( 8 min )
    r/MachineLearning is joining the Reddit Blackout starting June 12th
    Hi folks, At this point you all are probably well aware of the shenanigans Reddit has been pulling regarding their announced API changes. These changes are forcing many third party apps to shutdown, including Apollo, Reddit is Fun, Sync, Narwhal, and many more. Many of the mods here, including me, use one of these apps to help moderate the sub. Furthermore, it's now clear that Reddit is not acting in good faith. This includes falsely accusing the creator of Apollo of extortion, ignoring app developers requests to communicate while saying they are working devs, and requiring devs who make accessibility-focused apps to do so for free! This mirrors the philosophy they have for moderation: have unpaid volunteers provide millions of hours of unpaid labor for Reddit. We previously asked the community if we should join the planned Reddit blackout and the answer was a resounding yes. So, that's what we plan to do. We feel there are enough other platforms for machine learning discussion (Hacker News, Twitter, Mastodon, etc), that people can migrate there in the meantime until Reddit reassesses their latest policy decisions. We hope to see you all on the other side. Sincerely, Your r/MachineLearning moderators submitted by /u/dojoteef [link] [comments]  ( 8 min )
    [R] Interview with Pascal Hitzler: The Rise of NSAI, Explainability, Concept...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    YOLO Model Explained
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 7 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-07-11T00:57:23.271Z osmosfeed 1.15.1